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I (57) Abstract 

Compositions and methods for 
I the detection and therapy of cancer are 
I disclosed. The compounds provided 
include human endogenous retroviral 
sequences that are prrferentially expressed 
in tumor tissue, as well as polypeptides 
encoded by such nucleotide sequences. 
Vaccines and pharmaceutical compositions 
comprising such compounds' are also 
provided and may be used; for example, 
for the prevention and treatment of cancer. 
The polypeptides may also be used for 
die production of antibodies, which are 
useful for diagnosing and monitoring the 
progression of cancer in a patient 
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Description 

COMPOSITIONS AND METHODS FOR THE TREATMENT 
AND DIAGNOSIS OF CANCER 

Technical FiVIH « 

The present invention relates generally to the detection and merapy of 
cancer. The invention is more specifically related to nucleotide sequences that are 
preferentially expressed in a tumor tissue and to polypeptides encoded by such 
nucleotide sequences. The invention is more particularly related to nucleotide, 
sequences comprising at lea* a portion of a human endogenous re^ that 
is preferentially expressed in a tumor tissue, and; to polypeptides encoded by such 
nucleotide sequences. The nucleotide sequences and polypeptides may be used in 
vaccines and pharmaceutical compositions for the prevention and treatment of cancer. 
The polypeptides may also be used for the production of compounds, such as 
antib<)dies,usenjlfordiagn<>singandmbmtori 

Backgrou nd of the Invention ; p i; i 

In recent years, considerable research has been directed to the 
20 identification of tumor markers, which may be useful for the diagnosis of particular 
cancers, for predicting the outcome of the disease or for developing a therapy in a 
patient-specific manner: Such research has generally focused on onc^ 
normal cellular genes whose expression has been altered by gene amplification, 
increased transcription, alteration of mRNA splicing or mutation within the coding 
region) such that otherwise nornial cells assume neoplastic growth behavior. To date 
however, the established markers have had a limited titilityV aid their use often leads to' 
a result that is difficult to interpret ^ & -un^i-a}' 

Management of cancer currently relies on" a combu^oh of early 
diagnosis and aggressive treatment, which may include one dr more of a variety of 
30 treatments such as ioirgery, ^ chemomeiap^> and hormone therapy 

* ^However; current diagnostic methods often fail*6 detect can^tii dis^ has 
progressed to a stole that is difficult to treat, and existing treatmentsoften haW serious 
side effects. The high mortality observed among cancers patients indicates that 
improvements are needed in the diagnosis and treatment of the disease. 
35 . Accordingly, there is a need in the aft for improved tumor markers and 

methods for therapy and diagnosis of cancer. The present invention fulfills these needs 
and further provides other related advantages. 
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Summary of the Invention 

Briefly stated, this invention provides compositions and methods for the 
diagnosis and therapy of cancer. In one aspect, isolated DNA molecules are provided, 
5 comprising: (a) a human endogenous retroviral sequence, wherein the retroviral 
sequence is preferentially expressed in a tumor tissue; (b) a variant of the human 
endogenous retroviral sequence that contains one or more nucleotide substitutions, 
deletions, insertions and/or modifications at no more than 20% (preferably no more 
than 5%) of the nucleotide positions, such that the antigenic and/or immunogenic 
10 properties of the polypeptide encoded by die human endogenous retroviral sequence are 
retained; or (c) a nucleotide sequence encoding an epitope of a polypeptide encoded by 
at least one of the above sequences. Isolated DNA and RNA molecules comprising a 
nucleotide sequent complementary, to a DNA molecule as described above are also 

.., 'proyided., .-. ■■ y.-.- • :•? 'v.; 

15 In ; anotiier aspect, the present invention provides ^an isolated DNA 

molecule encoding an epitope of a polypeptide, the polypeptide being; encoded by: 
(a) a nucleotide sequence transcribed from the sequence of SEQ ID NO: 11; er (b)a 
variant of the nucleotide sequence that contains one or more nucleotide substitutions, 
deletions, insertions and/or modifications at not more than 20% of the nucleotide 

20 positions, such that^^ of the polypeptide 

encoded by the nucleotide sequence are retained. Isolated DNA and RNA molecules 
comprising a nucleotide se^^ as described 

above are also provided. ? Vif 

In related aspects, the present invention provides recombinant 

25 expression vectors comprising a DNA molecule as described above and host cells 
transformed or transfected with such expression vectors. 

In further aspects, polypeptides, comprising an amino acid sequence 
encoded by a DNA molecule as described above, and monoclonal antibodies that bind 
to such polypeptides are provided. r 

30^ , , In another methods are provided for, determining the presence of 

! a cancer in a patient In one embodiment the method comprises detecting, within a 
biological sample obtawed.fiom a patient, a polypeptide as described above. In another 
embodiment, me method comprises detecting, within a biological sample, an RNA 
molecule encoding a polypeptide as described above. In yet another embodiment, the 

35 method comprises (a) intradermally injecting a patient with a polypeptide as described 
above; and (b) detecting an immune response on the patient's skin and therefrom 
detecting the presence of a cancer in the patient. 
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In a related aspect, diagnostic kits Useful in the determination of breast 
cancer are provided. The diagnostic kits generally comprise one or more monoclonal 
antibodies as described above, and a detection reagent. Within another related aspect, 
the diagnostic kit comprises a first polymerase chain reaction primer and a second 
5 polymerase chain reaction primer, the first and second primers each comprising at least 
about 10 contiguous nucleotides of an RNA molecule encoding a polypeptide as 
described above. Within yet another related aspect, the diagnostic kit comprises at least 
one oligonucleotide probe, the probe comprising at least about 15 contiguous 
nucleotides of a DNA molecule as described above. In another aspect, the present 
10 invention provides methods for monitoring the progression of a cancer in a patient In 
one embodiment, the method comprises: (a) detecting an amount, in a biological 
sample, of a polypeptide as described above; (b) subsequently repeating step (a); and 
(c) comparing the amounts of polypeptide detected in steps (a) and (b), and therefrom 
monitoring the progression of cancer in the patient. In another embodiment, the 
15 method comprises (a) detecting an amount, within a biological ! sample, of ari RNA 
molecule encoding a polypeptide as described above; (b) subsequently repeating step 
(a); and (c) comparing the amounts of RNA molecules detected in steps (a) and (b) ? and 
therefrom monitoring the progression of cancer in the patient 

In other aspects, pharmaceutical compositions, which comprise a 
20 polypeptide as described above and a physiologically acceptable carrier, and vaccines, 
which comprise a polypeptide as described above and an immune response enhancer 
are provided. 

In related aspects, the present invention provides methods for inhibiting 
the development of a cancer in a patient, comprising administering to a patient a 
25 pharmaceutical composition or vaccine as described above: • 

These and other aspects of the present invention will become apparent 
upon reference to the following detailed description and attached drawings. All 
references disclosed herein are hereby incorporated by reference in their entirety as if 
each was incorporated individually. 
30 ; "'■ ■ u- ' . : . .<■/ 

■a Brief Descriptiftn of the Drawings? .■; w -w ■<,<■■:-- -W^-* .'a $.r*~ 

■ Figure 1 shows the differential display PGR products, separated by gel 
electrophoresis, obtained from cDNA prepared from normal breast tissue (fanes 1 and 
2) and from cDNA prepared from breast tumor tissue froni the same patient (lanes 3 
* 5 andf)- The arrow indicates tile band corresponding to B18Agl 

Figure 2 is a northern blot comparing the level of B18Agl ihRNA in 
breast tumor tissue (lane 1) with the level in normal breast tissue: 
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Figure 3 shows th|e level of B18Agl mRNA in breast tumor tissue 
compared to that in various normal and non-breast tumor tissues as determined by 
RNase protection assays. 

Figure 4 is a genomic clone map showing the location of additional 
5 retroviral sequences (provided in SEQ ID NO:3 - SEQ ID NO: 10) relative to B18Agl. 

Figures 5A and 5B show the sequencing strategy, genomic organization, 
and predicted open reading frame for the retroviral element containing B 1 8 Ag 1 * 

Figure 6 shows the nucleotide sequence of the representative human 
endogenous retroviral element B 1 8 Agl - 

Detailed Description of the Invention 

As flpted abpyev the present invention is generally directed to 
compositions and methods for the diagnosis, monitoring and therapy of cancer. The 
compositions described herein include polypeptides, nucleic acid sequences and 
15; ■:, antibodies. Polypeptides of the present invention generally comprise at least a portion 
of f protein tharis encoded by 3 hunian endogenous retroviral sequence, wherein the 
human endogenous retroviral sequence is expressed at substantially greater levels in a 
human tumor tissue than in normal tissue (Le. t the kvel of RNA encoding the 
polypeptide is at least two fold higher, and preferably at least five fold higher, in a 
20 tumor tissue than in noi^l tissue). Such sequences are said to be ^preferentially 
expressed" in a tumor tissue. Any cancer characterijawl by increased expression of a 
human endogenous retroviral sequence within a tumor may be detected and/pr treated 
according to the present invention. Repre^ntative cancers include breast cancer, 
prostate cancer, leukemia, lymphoma and Kaposi's sarcoma! As used herein, the term 
25 "polypeptide" encompasses amino acid chains of any length, including full length 
proteins (and epitope thereof a human endogenous retroviral sequence. 

Nucleic acid sequences of the subject invention generally comprise a 
DNA or RNA sequence that encodes a polypeptide as de^rited • abqye^ or that is 
complementary to such a sequence. Antibodies are generally immune system proteins, 
30 or fragments thereof, that are capable of binding to a portion of a polypeptide ?s 
described above. Antibodies can be produced by fell : ^ 
- generation of monoclonal antibodies as desOTbed; herein, or via transfection of antibody 
genes into suitable bacterial or mammalian> cell hosts, in order to ^pw for the 
production of recombinant antibodies. --v;:-/ \ 

35 \ Polypeptides witWn the scope p this invention include, but are npt 

limited to, polypeptides (and epitopes thereof encoded by the human endogenous 
retroviral sequences described herein. Such sequences include the sequence designated 
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B18Agl (SEQ ID NO:l) as well as other sequences such as those recited in SEQ ID 
N0.3-SEQ ID NO:10, found within the retroviral genome containing B18Agl (SEQ ID 
NO: 11): BI8Agl has homology to the P30 gene of the endogenous human retroviral 
element S71, as described in Werner et al., Virology J 74:225-23% (1990). As used 
5 herein, the term "polypeptide" encompasses amino acid chains of any length, including 
lull length proteins encoded by a human endogenous retroviral element A polypeptide 
" ^P^'ng an epitope of a human endogenous retroviral element may consist entirely 
of the epitope, or may contain additional sequences. The additional sequences may be 
derived from the native protein or may be heterologous, and such sequences hiay (but 
10 need not) possess immunogenic or antigenic properties. 

An "epitope," as used herein is a portion of a polypeptide that is 
recognized (i fc . specifically bound) by a B-cell and/or T-cell surface antigen receptor. 
Epitopes may generally be identified using well known techniques, such as those 
M summarized in Paul, Fundamental Immunology, 3rd ed., 243-247 (Raven Press, 1993) 
15 and references cited therein. Such techniques include screening polypeptides derived 
from the native polypeptide for the ability to; react with antigen-specific antisera and/or 
T-cell lines or clones. An epitope of a polypeptide is a portion that reacts with such 
antisera and/or T-cells at a level that is similar to the reactivity of the full length 
polypeptide (e.g., in an ELISA and/Or T-cell reactivity assay). Such screens may 
20 generally be performed using methods well known to those of ordinary skill in the art, 
such as those described in Harlow and Lane, Antibodies: A Laboratory Manual, Cold 
S P" n 8 Harbor Laboratory, 1988. B-cell and T-cell epitopes may also be predicted via 
computer analysis. ; Polypeptides comprising ah epitope bf a polypeptide that is 
!. Preferentially expressed in a tumor tissue (with or without additional amino acid 
25 sequence) are within the scope of the present invention: 

The compositions and methods of the present invention also encompass 
> variants of the above polyi)eptides and nucleic acid sequences encoding such 
polypeptides. A polypeptide "variant," as used herein, is a polypeptide that differs from 
the native i»Iypeptide in substitutions and/or modifications such that the antigenic 
30 and/or immunogenic properties of the polypeptide are retained. Such variants may 
generally; be identified of the above polypeptide sequences and 

evaluating the reactivity of the modified polypeptide with antisera and/or T-cells as 
described above. Nucleic acid variants may contain one or more substitutions, 
deletions, insertions and/or modifications such that the antigenic and/or immunogenic 
35 properties of me eiKoded polypeptide are retained. Ohe preferred variant of a human 
endogenous retroviral sequence, or an epitope thereof, is a variant that contains 
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nucleotide substitutions, deletions, insertions and/or modifications at no more than 20% 
of the nucleotide positions within the native polypeptide sequence, i , • 

Preferably, a variant contains conservative substitutions. A 
"conservative substitution" is one in which an amino acid is substituted for another 
- 5 amino acid that has similar properties* such that one skilled in the art of peptide 

■ v -, chemistry would expect the secondary structure and hydropathic . nature of the 
polypeptide to be substantially unchanged, In general, the following: groups \of amino 
acids represent conservative changes:, (l)ala, pro, gly, glu, asp, gin, ash, ser, thr; 

, (2) cys, ser, tyr, thr; (3) val, lie, leu, met, ala, phe; (4) lys, arg, his; and (5) phe, tyr, trp, 

10 his. : ; ./ • .-_....-..:•=-*• 

Variants may also (or alternatively) be modified by, for example, the" 
deletion or addition of amino acids that have minimal influence on the immunogenic or 

. antigenic properties, secondary structure and hydropathic, nature of the polypeptide. 
For example, a polypeptide may be conjugated to a signal (or leader) sequence at the N- 

15 terminal end of me protein wWch w 

transfer of the protein. The polypeptide may also be conjugated to a linker or other 
sequence for ease of synthesis, purification or identification of the polypeptide (e.g.. 
poly-His), or to enhance binding of the polypeptide to a solid support For example, a 
polypeptide may be conjugated to an. immunoglobulin Fc region. 

20 Human endogenous retroviral; sequences 'that , are expressed at 

substantially greater levels in a human tumor tissue than in normal tissue may be 
prepared using any of several techniques. For example, the human endogenous 
, retroviral sequence designated Bl 8Ag 1 (Figure 6 and SEQ JD NO:l ) may be cloned on 
the basis of its breast tumor specific expression, using differential display PCR. This 

25 technique compares the amplified products from poly A+ or total RNA template 
prepared from normal and breast tumor tissue. cDNA may be iprepared by reverse 
transcription of RNA using a (dT) j 2 AG primer. Following amplification using the 
; primer CCTCAACCTC (SEQ ID NO: 13), a band corresponding to an ?amplified 
product specific to the tumor RNA may be cut but from a silver stained gel and 

30 subcloned into a suitable vector (eg., the T-vector, Novagen, Madison, WI). 

Alternatively, the B 1 8Agl gene (or a portion meredf) may t^'arhplified 
- from human genomic DNA, or from breast tumor cDNA, via polymerase chain 
reaction; For this approach. B18Agl sequence-specific primers may be designed based 
on the sequence provided in SEQ ID NO:l , and may be purchased or synthesized; One 

35 suitable primer pair for amplification from breast tumor cDNA is (5'ATG GCT ATT 
TTC 6GG GGC TGA CA) (SEQ ID NO:14) and (5'GCG GTA TGT CCT GGTGGG 
TAT T) (SEQ ID NO: 1 5). An amplified portion of B 1 8Ag 1 may then be used to isolate 
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the full length gene frorn a human genomic DNA library or from a breast tumor cDNA 
library, using well known techniques such as those described in Sambrobk et al., 
Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratories, Cold 
Spring Harbor, NY (1989). Other sequences within the retroviral genome containing 
5 B18Agl, such as those recited in SEQ ID NO:3 - SEQ ID NO:10, may be similarly 
prepared by screening human genomic libraries using B 1 8 Ag 1 -specific sequences as 
; probes. ' 

Other human endogenous retroviral sequences that are expressed at 
substantially greater levels in a human tumor tissue than in normal tissue may be 
10 prepared using methods known to those of ordinary skill in the art. For example, such 
sequences may be identified using idw stringency ' hybridization, followed by PCR to 
identify conserved motifs. The level of expression in tumor tissue may generally be 
evaluated using the methods described herein, such as PCR arid Northern blot analysis. 

Recombinant polypeptides encoded by the DNA sequences described 
1 5 above may be readily prepared from the DNA sequences. For example, supernatarits 
from suitable host/vector systems which secrete recombinant polypeptide into culture 
media, may be first concentrated using a commercially available filter. Following 
concentration, the concentrate may be applied to a suitable purification matrix such as 
an affinity matrix or an ion exchange resin. Finally, brie or more reverse phase HPLC 
20 steps can be employed to further purify a recombinant polypeptide: 

In general, any of a variety of expression Vectors known to those of 
ordinary skill in the art may be employed to express recombinant polypeptides of this 
invention. Expression may be achieved in any appropriate host cell that has been 
transformed or transfected with an expression vector containing a DNA molecule that 
25 encodes a recombinant polypeptide. Suitable host cells include prokaiyotes, yeast and 
higher eukarybtic cells. Preferably, the host cells employed are % coli y yeast or a 
riiammalianbellHrtesuchasCOSbrCHO? 

Such techniques may alsd be used to prepare polypeptides comprising 
epitopes or variants of the native polypeptides. For example, variants of a native 
30 polypeptide may generally be prejwred using standard mutagenesis techniques, such as 
oligonucleotide^^ muiaienesiS, arid sections of the DNA sequence 

may be removed to permit preparation of truncated polypeptides. Portions and other 
variant having fewer than about 100 amino acids, and generally fewer man about 50 
amino acids, may also be generated by syrithelic means, using techniques well known 
5 to * hose of ordinary skill in the art For example, such polypeptides may be synthesized 
using any of the commercially available solid-phase techniques, such as the Merrifield 
solid-phase synthesis method, where amino acids are sequentially added to a growing 
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amino acid chain. See Merrifield, J. ^m. Chenu Soc. 85:2 149-2 146, 1963, Equipment 
for automated synthesis of polypeptides is commercially available from suppliers such 
as Applied BioSystems, Inc., Foster City, CA, and may be operated according to the 
manufacturer's instructions. 

5 In specific embodiments, polypeptides of the present invention 

encompass polypeptides encoded by a human endogenous retroviral sequence that is 
expressed at substantially greater levels in a human tumor tissue than in normal tissue 
(such as the sequence recited in SEQ ID NO: l), variants of such polypeptides that are 
encoded by DNA molecules containing one or more nucleotide substitutions, deletions, 

10 insertions and/or modifications at no more than 20% of the nucleotide positions, and 
epitopes of the above polypeptides. Polypeptides within the scope of the present 
invention also include polypeptides (and epitopes thereof) encoded by DNA sequences 
that hybridize to the above sequences under stringent conditions, wherein the DNA 
sequences are at least 80% identical in overall sequence to the sequence recited in SEQ 

15 ID NO: 1 , and wherein RNA corresponding to isaid nucleotide sequence is expressed at a 
greater level in human tumor tissue than in the c»irespqndmg normal,^ As used 
herein, "stringent conditions" refers to prewashing in a solution of 6X SSC, 0^% SDS; 
hybridizing overnight at 65«C in 6X SSC, 02% SDS; followed by washing twice at 65° 
C for 30 minutes each with IX SSC, 0.1% SDS, and then washing twice at65°C for 30- 

20 60 minutes eadi ^m 0.1X^SC ? 0,1% SDS. DNA molecules according to ^ 
invention include molecules that encode any of the above polypeptides. 

In another aspect of the present invention, antibodies are provided. Such 
antibodies may be prepared by any of a variety of techniques known to those of 
ordinary skill in the art .See, e.g.. Harlow and Lane, Antibodies: A Laboratory 

25 Manual, Cold Spring Harbor Laboratory, 1988- In one such technique, an immunogen 
comprising me polypeptide is initially injected into any of a wide variety of mammals 
(e.g. mice, rats, rabbits, sheep or goats). In this step, me polypeptides of tin? invention 
may serve as the immunogen without modmcation. Alternatively, particularly for 
relatively short polypeptides, a superior immune response may be elicited if the 

30 |X)lypeptide b joined to a carrier protein, such as bovine serum albumin or keyhole 
limpet hembcyamn the immunogen is injected into the animal host preferably 
according to a predetermined schedule incorporating one or , more booster 
immunizations, and me animals are bled periodically. Polyclonal antibqdies specific 
for the polypeptide may then be purified from such antisera by, for example, affinity 

35 chtomatography using the polypeptide coupled to a suitable solid support. 

' Monoclonal antibodies specific for the antigenic polypeptide of interest 

may be prepared, for example, using the technique of Kohler and Milstein, £«r. J. 
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Immunol. 6:51 1-519, 1976, and improvements thereto. Briefly, these methods involve 
the preparation of immortal cell lines capable of producing antibodies having the 
desired specificity (ie., reactivity with the polypeptide of interest). Such cell lines may 
be produced, for example, from spleen cells obtained from an animal immunized as 
5 described above. The spleen cells are then immortalized by, for example, fusion with a 
myeloma cell fusion partner, preferably one that is syngeneic with the immunized 
animal. A variety of fusion techniques may be employed. For example, the spleen 
cells and myeloma cells may be combined with a nohionic detergent for a few minutes 
and then plated at low density on a selective medium that supports the growth of hybrid 
10 cells, but not myeloma cells. A preferred selection technique uses HATOiypoxanthjne,. 
aminopterin, thymidine) selection. After a sufficient time, usually about Ho 2 weeks, 
colonies of hybrids are observed. Single colonies are selected and their culture 
supernatants tested for binding activity against the polypeptide. Hybridomas having 
high reactivity and specificity are preferred. 
15 Monoclonal antibodies may be isolated from the supernatants of 

growing hybridbma colonies. In addition, various techniques may be employed to 
enhance the yield, such as injection of the hybridoma cell line into the peritoneal cavity 
of a suitable vertebrate host, such as a mouse. Monoclonal antibodies may then be 
harvested from the ascites fluid or the blood. Contaminants may be removed from the 
20 antibodies by conventional techniques, such as chromatography, gel filtration, 
precipitation, and extraction. The polypeptides of this invention may be used in the 
purification process in, for example, an affinity chromatography step. 

Antibodies may be used, for example, in methods for detecting a cancer 
(such as breast cancer, prostate cancer, leukemia, lymphoma or Kaposi's sarcoma) in a 
25 patient Such methods involve using one or more antibodies to detect the presence or 
absence of a polypeptide as described herein in a suitable biological sample/ As used 
herein, suitable biological samples include tumor or normal tissue biopsy; mastectomy, 
blood, lymph node, serum and urine samples or other tissue, homogenate or extract 
thereof, obtained from a patient It wm be evident fo t^ art 
30 that following detection of a polypeptide within a non-biopsy sample, additional tumor 
A» markers may be employed to identify the particular type of cancer. ii^a vbo'w.«<- 

There are a variety of assay formats known to those of ordinary skill in 
the art for using an antibody to detect polypeptide markers in a sample." See. e.g., 
Harlow and Lane, AntibodUs: A Laboratory ManMl; (2fA& 
15 19*8. For example, the assay may be performed in a Western blot format wherein a 
protein preparation from the biological sample is submitted to gel electrophoresis, 
transferred to a suitable membrane and allowed to react with antibody. The presence of 
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antibody Pti the membrane may then be detected using a suitable detection reagent, as 
.-described below. 

In another embodiment, the assay involves the use of an antibody 
immobilized on a solid support to bind to the polypeptide and remove it from the 
5 remainder of the sample. The bound polypeptide may then be detected using a second 
antibody that binds to the binding partner/polypeptide complex and contains a reporter 
group. Alternatively, a competitive assay may be utilized, in which a polypeptide is 
labeled with a reporter group and allowed to bind to the immobilized antibody after 
A incubation of the antibody with the sample. The extent to which components of the 
10 sampje inlubit the binding of the jabeled polypeptide ^ to the antibody is mdicative of the 
reactivity of the sample with the immobilized antibody, and as a result is indicative of 
.'.«•■,.•■ the concentration of polypeptide in the sample. 

>; , n The solid support may be any material known to those of ordinary skill 

in the art to which the antibody may be attached. For example, the solid support may 
15 : be a,test ; we|l -in a microliter plate ^or:#mtroceliulose filter or other suitable membrane. 
; Alternatively, the support may be a bead or disc, such as glass, fiberglass, latex or a 
v plastic material such as polystyrene or poly vinylchloride. The support may also be a 
magnetic particle or a fiber optic sensor, such as those disclosed, for example, in U.S. 
,r.x . Patent No. 5,359,681. jv .. ■■■,;,0 - 

20 The antibody may be immobilized on the solid support using a variety of 

; techniques knovynto those in the art, which are amply described in the patent and 
scientific literature. In the context of the present invention, the term •immobilization" 
. refers to both noncovalent association, such as adsorption, and covalent attachment 
: , (which may be a direct linkage between the antigen and functional groups on the 
25 support or may be a linkage by way of a cross-linking agent). Immobilization by 
; adsorption to a well m a microliter plate or to a membrane is preferred. In such cases, 
- adsorption may be achieved by contacting the antibody, in a suitable buffer, with the 
solid support for a suitable amount of time. The contact time varies with temperature, 
but is typically between about 1 hour and 1 day. In general, contacting a well of a 
Spastic m^ 

antibody nmgmgW and preferably, about 100-200 ng, is 

sufficient to immobilize an adequate amount of polypeptide. 

; Covalent attachment of antibody to a solid support may "generally be 
achieved by first reacting the support with a bifunctional reagent that will react with 
35 bom the support and a functional group, such as a hydroxyl or amino group, on the 
antibody. For example, the antibody may be covalently attached to supports having an 
appropriate polymer coating using benzoquinone or by condensation of an aldehyde 
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group oh the support with an amine and an active hydrogen on the binding partner f see, 
e.g., Pierce Inununotechnology Catalog and Handbook (1991) at A12-A13). 

In certain embodiments for detection of polypeptide in a sample, the 
assay is a two-antibody sandwich assay. This assay ' may be performed by first 
5 contacting an antibody that has been immobilized on a solid support, commonly the 
well of a micf other plate, with the biological sample, such that the polypeptide within 
the sample are allowed to bind to the immobilized antibody. Unbound sample i^ then 
removed from the immobilized polypeptide-antibody complexes and a second antibody 
(containing a reporter group) capable of binding to a different site on the polypeptide is 
10 added. The amount of second antibody that remains bound to the solid support is then 
determined using a method appropriate for the specific reporter group. 

More specifically, once the antibody is immobilized on the support as 
described above, the remaining protein binding sites on the support are typically 
blocked. Any suitable blocking agent known to those of ordinary skill in the art, such 
1 5 as bovine serum albumin or Tween 20™ (Sigma Chemical Co., St. Louis, MO)? the 
immobilized antibody is then incubated with the sample, and polypeptide is allowed to 
bind to the antibody. The sample may be diluted with a suitable diluent, such as 
phosphate-buffered saline (PBS) prior to incubation. In general, an appropriate contact 
time (i.e. , incubation time) is that period of time that is sufficient to detect the presence 
20 of polypeptide within a sample obtained from an individual with breast cancer. 
Preferably, the contact time is sufficient to achieve a level of binding that is at least 
95% of that achieved at equilibrium between bound and unbound polypeptide. Those 
of ordinary skill in the art will recognize that the time necessary to achieve equilibrium 
may be readily determined by assaying die level of binding that occurs over a period of 
25 time. At room temperature, an 'incubation time of about 30 minutes is generally 
sufficient. ' : ^ • V .-^ftrt-KlK 

Unbound sample may then be removed by washing' the solid support 
with an appropriate buffer, such as PBS contammg 61 1 % Tween 20**. The second 
antibody, which contains a reporter group, may then be added to the solid support. 

30 Preferred reporter groups include enzymes (such as horseradish peroxidase), substrates, 
cofactois, inhibitors, dyes, radionuclides, lunimesc^groups, fluorescent groups and 
biotih. The conjugation of antibody to reporter group may be achieved using standard 
methods known to those of ordinary skill in the art. ' - 

The second antibody is then incubated with the immobilized antibody- 

35 polypeptide complex for an amount of time sufficient to detect the bound polypeptide. 
An appropriate amount of time may generally be determined by assaying the level of 
binding that occurs over a period of time. Unbound second antibody is men removed 
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and bound second antibody is detected using the reporter group. The method employed 
for detecting the reporter group depends upon the nature of the repqrter group. For 
radioactive groups, scintillation cotmting or autoradiographic methods are generally 
^ptopnate. Spectroscopic methods may be used to detect dyes, luminescent groups 
5 ' - and fluorescent groups. Biotin may be detected using avidin, coupled tq a different 
reporter group (commonly a radioactive or fluorescent group or an enzyme). Enzyme 
reporter grbiips may generally be detected by the addition of substrate (generally for a 
specific period of time), followed by spectroscopic or other analysis of the reaction 

products. *."•''. 

10 To determine the presence or absence of a cancer, the signal detected 

from tie reporter group that remains bound to the solid support is generally compared * 
to a signal that corresponds to a predetermined cut-off value. In one preferred 
embodiment, the cut-off value is the average mean signal obtained when the 
immobilized antibody is incubated with samples from patients without cancer. In 

15 general, a sample generating a signal that is three standard deviations above, the 
predetermine cut-off value may be : .m^ida^. positive for a cancan In an alternate 
preferred embodiment, the cut-off value is determined using a Receiver Operator 
Curve, according to the method of Sackett et al., (Clinical Epidemiology: A Basic 
Science Jpr Clinical Medicine, p. 106-7 (Little Brown ami Co., 1985). Briefly, in this 

20 embodiment, the cut-off value may be determined from a plot of pairs of true positive 
rates (/.^ sensitivity) and false positive rates (lOO^specificity) that corr^sppnd to 
each possible cut-off value for the diagnostic test result The cut-off value on the plot 
that is the closest to the upper left-hand corner (/.e., the value that encloses the largest 
area) is the most accurate cut-off value, and* sample generating a signal that is higher 

25 than the cut-off value determined by this method may be considered positive. 
• ■ ' ' Alterriitively, the citt-off may be shifted to the left along the plot, to minimize the 
false positive rate, or to the right, to minimize the false negative rate. In general, a 
sample generating a signal that is higher than the cut-off value, determined by this 
method is considered positive for a cancer. - r; , u 

30 In a related embodiment, the assay is jperformed in a flow-through or 

strip tot format, wherein the antibody is immobilized on a membrane, such as 
nitrocellulose. In the flow-through test, die pqfyp^Jtide within the sample binds to the 
immobilized antibody as the sample passes through the membrane. A second, labeled 
antibody then binds to the Mtibod^-polypq^jde complex ^ a solution containing the 

35 second antibody flows through the membrane. The detection of bound second antibody 
may thdft be performed as described above. In the strip test format, one end of the 
membrane to which antibody is bound is immersed in a solution containing the sample. 
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The sample migrates along the membrane through a region containing second antibody 
and to the area of unmobilized antibody. Concentration of second antibody at the area 
of immobilized antibody indicates the presence of breast cancer. Typically, the 
concentration of second antibody at that site generates a pattern, such as a line, that can 
5 be read visually. The absence of such a pattern indicates a negative result In general, 
the amount of antibody immobilized on the membrane is selected to generate a visually 
discernible pattern when the biological sample contains a level of polypeptide that 
would be sufficient to generate a positive signal in the two-antibody sandwich assay, in 
the format discussed above; Preferably, the amount of antibody immobilized on the 
10 membrane ranges from about 25 ng to about lug, and more preferably from about 50 
ng to about lug; Such tests can typically be performed with a very small amount of ' 
biological sample. • 

The presence or absence of a cancer in a patient may also be determined 
by evaluating the level of mKNA encoding a polypeptide of the present invention 
within the biological sample (e#. a biopsy, mastectomy and/or blood sample from a 
patient) relative to a predetermined cutoff value. Such an evaluation may be achieved 
using any of a variety of methods known to those of ordinary skill in the art such as, for 
example, in Situ hybridization and amplification by polymerase chain reaction. For 
example, polymerase chain reaction may be used to amplify sequences from cDNA 
prepared from RNA that is isolated from one of the; above biological samples; 
Sequence-specific primers , for use in such amplification may be designed based on a 
cDNAor genomic sequence, such as a sequence provided in SEQ ID NO: 1 or SEQ lb 
NO:3 - SEQ ID NO: 10; and may be purchased or synthesized. In the case of B 1 8Agl, 
as noted herein, one suitable primer pair is (5'ATG OCT ATT TTC GGG GGC TGA 
25 CA) (SEQ ID NO:14) and (5'CCG OTA TCT CCT CGT GGG TAT T) (SEQ ID 
NO: 15). The PCR reaction products may then be separated and visualized using gel 
electrophoresis, according to methods well known to those of ordinary skill in the art 
Amplification is typically performed on samples obtained from matched pairs of tissue 
(tumor and non-tumor tissue from the same individual) or from unmatched pairs of 
30 tissue (tumor and non?^ The amplification 

magnitude. A two-fold or greater increase in expression in several dilutions of the 
tumor sample as compared to me same dilution of the non-tumor sample is considered 

positive/:.:?: ;: - V { ; . " ;.-,y :?r . . 

35 Conventional RT-PCR protocols usmgWagarose and ethidium bromide 

staining}; while important in defining gene specificity do not lend themselves to 
diagnostic kit development because, of the time and effort required in making them 
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quantitative (i.e., construction of saturation and/or titration; curves), and their sample 
throughput. This problem is overcome by the development of procedures such as real 
time RT-PCR which allows for assays to be performed in single tubes, and in turn can 
be modified for use in 96 well plate formats. Instrumentation to perform such 
5 methodologies are available from ABI/Perkin Elmer. Alternatively, other high 
throughput assays using labelled probes (e.g . digoxygenin) in combination with 
labelled (e.g., enzyme fluorescent, radioactive) antibodies to such probes can also be 
used in the development of 96 well plate assays. 

In yet another method for determining the presence or absence of a 

10 cancer in a patient, one or more of the polypeptides described above may be used in a 
skin test. As used herein, a "skin test" is any assay performed directly on a patient in 
which a delayed-type hypersensitivity (DTH) reaction (such as swelling; reddening or 
dermatitis) is measured following intradermal injection of one or more polypeptides as 
described above. Such injection may be achieved usmg any suitable deyice sufficient 

15 to contact the polypeptide or polypeptides with dermal, cellsof the patient, such as a 
tuberculin syringe or 1 mL syringe. Preferably, the reaction is measured at least 48 
hours after injection, more preferably 48-72 hours. : ; i f; * i 

The DTH reaction is a celltinediated immune^ response, which is greater 
in patients that have been exposed previously to a test antigen (ie.; an immunogenic 

20 portion of a polypeptide employed, or a variant thereof)- The response may measured 
visually, using a ruler. In general, a, response that is greater than about 0.5 cm in 
diameter, preferably greater tiian about. 1.0 cm in diameter, is. a positive response* 
indicative of a cancer. As noted above, additional tumor markers may be employed* 
using methods known to those of ordinary skill in the art, to identify the type of cancer 

25 present. ; • '• - ;i Vji 

The polypeptides of this invention are preferably formulated, for use in a 
skin test, as pharmaceutical compositions containing!^ least one polypeptide and a 
physiologically acceptable carrier, such as water, saline* alcohol, or a buffer. Such 
compositions typically contain one or more of the above polypeptides in an amount 

30 ranging from about 1 ug to 100 ug, preferablyfrom about 10 pg to 50 ug in a volume 
of 0.1 jmL. Preferably, the carrier employed in such pharmaceutical conipositiohs is a 
saline solution with appropriate preservatives* such as phenol and/or Tweeh 80^ 

In other aspects of the:present invention, Ihe progression and/oiCresponse 
to treatment of a cancer may be monitored by performing any of the above assays over 

35 a period of time, and evaluating ihechange in the level of the response the amount 
of polypeptide or mRN A detected or, in the case ofa skin test, the extent of the immune 
response detected). For example, the assays may be performed every 1^2 months for a 
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period of 1-2 years. In general, a cancer is progressing in those patients in whom the 
level of the response increases over time. Wcontrast, a cancer is not progressing when 
the signal detected either remains constant or decreases with time. 

In further aspects of the present invention, the compounds described 
5 herein may be used for the immunotherapy of a 'cancer. In these aspects, the 
compounds (which may be polypeptides; antibodies or nucleic acid molecules) are 
preferably incorporated into pharmaceutical compositions or vaccines. Pharmaceutical 
compositions comprise one or more such compounds and a physiologically acceptable 
carrier. Vaccines may comprise one Or more polypeptides and an immune response 
10 enhancer, such as an adjuvant or a liposome (into which the compound is incorporated). 
Pharmaceutical compositions and vaccines may additionally contain a delivery system, 
such as biodegradable: mioospheies which are' disclosed, for example, in U.S. Patent 
Nos. 4,897,268 and 5,075,109. Pharmaceutical compositions and vaccines within the 
scope of the present invention may also contain other compounds, including one or 
15 more separate polypeptides. 

Alternatively, a vaccine may contain DNA encoding one or more of the 
polypeptides as described above, such that (he polypeptide is generated in situ. In such 
vaccines, die DNA may be present within any of a variety of delivery systems known to 
those of ordinary skill in the art, including nucleic acid expression systems, bacteria aiid 
20 viral expression systems: Appropriate nucleic acid' expression systems contain the 
necessary DNA sequences for expression in the patient (such as a suitable promoter and 
terminating signal); Bacterial delivery systems involve the administration of a 
bacterium (such as BaciUus-Cdlmette-Guerrin) that expresses an immunogenic portion 
of me polypeptide on its cell surface. In a preferred embodiment, the DNA may be 
25 introduced using a viral expression system (eg . vaccinia or other pox virus; retrovirus, 
or adenovirus), which may involve the use of a non-pathogenic (defective), replication 
competent virus. Techniques for incorporating DNA into such expression systems are 
weU known to those of ordinary skill in the art. The DNA may also be ''naked," as 
described, for example, in Ulmer et al., Science 25P:1745-1 749 (1993) and reviewed by 
30 Cohen, Science 2JP: 1691 -1692 (1993). The uptake of naked DNA may be increased 
r; . 66^ting the DNA onto biodegradable beads; • wfiicfr are efficiently transited info 
• the ceils, •"••'v.-' '' >; V,"^' nri; V- -I"-' 

Wlule any suitable earner 
be employed in the pharmaceutical compositions of this invention, the type* of courier 
will yary depending on the mode of administration. For parenteral administration, such 
as subcutaneous injection; the carrier preferably comprises water, saline, alcohol, a fat, 
a wax or a buffer. For oral administration, any of the above carriers or a solid carrier, 
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such as mannitol, lactose, starch, magnesium stearate, sodium saccharine, talcum, 
cellulose, glucose, sucrose, and magnesium carbonate,; may be employed. 
Biodegradable microspheres (e.g., polylactate polyglycolate) may also be employed as 
carriers for the pharmaceutical compositions of this invention. 
5 Any of a variety of adjuvants may be employed in the vaccines of this 

invention to nonspecifically enhance the immune response. Most adjuvants contain a 
substance designed to protect the antigen from rapid catabolism, such as aluminum 
hydroxide or mineral oil, and a nonspecific stimulator of immune responses,, such as 
lipid A, Bordello pertussis qt Mycobacterium tuberculosis-derived proteins. Suitable 
10 adjuvants are commercially available as, for example, Freund's Incomplete Adjuvant 
and Complete Adjuvant (Difco Laboratories, Detroit, MI), Merck Adjuvant 65 (Merck 
and Company, Inc., Rahway, NJ), alum, biodegradable microspheres, monophosphoryl 
lipid A andmiil A Cytokines, such as GM-CSF or interleukin-2, -7, or -12, may also 
be used as adjuvants. 

15 The above pharmaceutical compositions and vaccines may be used, for 

example, for the therapy of cancer in a patient. As used herein; a ^patiehf refers to any 
warm-blooded animal, preferably a human. A patient may or may not be afflicted with 
a cancer/ Accordingly, the above pharmaceutical compositions and vaccines may be 
; used to prevent the development of a cancer or to treat a patient afflicted with a cancer. 

20 to prevent the development of a cancer, a pharmaceutical composition or vaccine 
comprising one or more polypeptides as described herein (or naked, plasmid or viral 
vector DN A encoding such a polypeptide) may be administered to a patient. For 
treating a patient with a cancer, the pharmaceutical composition or vaccine may 
comprise one or more polypeptides, antibodies or nucleic , acid molecules 

25 complementary to DNA encoding a polypeptide as described herein {e.g., antisense 
RNA or antisense deoxyriboitucleotide oligonucleotides). r : : 

,. , ; , For example, tumor cells mat^ a polypeptide as described herein 

may be preferentially killed by administering to: a patient a conjugate in which a 
cytotoxic agent or prodrug"; is linked to antisense RNA, an antisense 

30 deoxyribonucleotide oligonucleotide or an antibody that binds to such a polypeptide- ; 
As used herein, the term prodrug*' refers to a group that is not itself toxic to tiie cdl, 
but that can be rendered toxic after the conjugate is directed to the target cell by the 
addition of a second activating compound, such, as an enzyme that can convert the 
piodrug mto an artive drug. Any suitable cytotoxic agent (including radionuclides) or 

35 prodrug known to mose of ordinary skill in the art may be employed in such methods. 
Suitable prodrugs include boron, doxifluridine, or the prodrug precursor of palytoxin. 
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Routes and frequency of administration, as well as dosage, will vary 
from individual to individual. In general, the pharmaceutical compositions and 
vaccines may be administered by injection {e.g., intracutaneous, intramuscular, 
intravenous or subcutaneous), mtranasally (e.g., by aspiration) or orally. Between 1 
5 and 10 doses may be administered for a 52 week period. Preferably, 6 doses are 
administered, at intervals of one month, and booster vaccinations may be given 
periodically thereafter. Alternate protocols may be appropriate for individual patients. 
A suitable dose is an amount of a compound that, when administered as described 
above, is capable of promoting an anti-tumor immune response. Such a response can 
10 be monitored by measuring the level of anti-tumor antibodies in a patient or by vaccine- 
dependent generation of cytolytic effector cells capable of killing the patient's tumor 
cells in vitro. A suitable dose should also be capable of causing an immune response 
that leads to an improved clinical outcome (e.gi, more frequent remissions, complete or 
partial or longer disease-free survival) m vaccinated patients as compared to non- 
15 vaccinated patients.. In general, for pharmaceutical composition* and vaccines 
comprising one or more polypeptides, the amount of each polypeptide present in a dose 
ranges from about 100 to about 5 nig. Suitable dose sizes will vary with the size of 
the patient, but will typically range from about 0.1 mL to about 5 mL. 

20 The following by way of illustration and hot by 

way of limitation. 



PCT/US97/00398 

W0 97/25431 " . 

18 



EXAMPLES 
Example 1 

Preparation of B18Ael cDNA and Geno m ic Clones Using Differential Display RT- 

5 ; ' ,. ...v.-' ECR ^ ' .. .; ^, 

This Example iUustrates the preparation of cDNA and genomic DNA 
molecules encoding B18Agl using a differential display screen. 

Tissue samples were prepared from breast tumor and normal tissue of a 

10 patient with breast cancer that was confirmed by pathology af^ removal from the 
patient Normal RNA and tumor RN A was extracted from the sarnpjes and m^JA was 
isolated and converted into cDNA using a (dT), 2 AG anchored 3' primer. Differential 
display PGR was then executed using a randomly chosen primer (CTTC AACCTQ 
(SEQ ID NO: 16). Amplification conditions were standard buffer containing 1 .5 mM 

15 MgCI 2 , 20 pmol of primer, SOO pmol dNTP, and 1 unit of ra^ DN^ polymerase 
(Perkm-Euuer, Branchbuig, NJ), Forty cycles of amplification were perforped using 
94°C denaturation for 30 seconds, 42°C annealing for 1 minute, ahd 72°C extension for 
30 seconds. An RNA fingerprint containing 76 amplified products was obtained. 
Although the RNA fingerprint of breast tumor tissue was ova 98% identical to that of 

20 the normal breast tissue, a band was repeatedly observed tp be specific to the RNA 
fingerprint pattern of the tumor. This band was cut out of a silver stained gel and 
subcloned into the T-vector (Novagen, Madison, WI) and sequenced. 

The sequence of the cDNA, referred to as B18Agl , is provided in SEQ 
ID NO:l. A database search of GENBANK and EMBL revealed that the B18Agl 

25 fragment initially cloned is 77% identical to the endogenous human retroviral element 
S71, which is a truncated retroviral element homologous to the Simian Sarcoma Vims 
(SSV). S71 contains a complete gag gene, a portion of the pol gene and an LTR-like 
structure at the 3' teiminus {see Werner et al., Virology 174:225-23% (1990)). BlSAgl 
is also 64% identical to SSV in the region coiresponding to the P30 (gag) locus. 

30 B18Agl contains three separate and incomplete reading frames covering a region which 
shares ro^ to a wide variety of gag proteins of retroviruses which 

infect mammals. In addition, the homology to S71 is not just within the gag genu, but 
spans several kb of sequence including an LTR. 

B18Agl-specific PCR primers were synthesized using computer 

35 analysis guidelines. RT-PCR amplification (94'C. 30 seconds; 60°C -> 42°C, 30 
seconds; 72°C, 30 seconds, for 40 cycles) confirmed that BlSAgl represents an actual 
mRNA sequence present at relatively high levels in the patient's breast tumor tissue. 
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The primers used in amplification were B18Agl-l (CTG CCT GaG CCA CAA ATG) 
(SEQ ID NO:17) and B18Agl-4 (CCG GAG GAG GAA GCT AGA GGA ATA) (SEQ 
ID NO: 18) at a 3.5 mM magnesium concentration and a pH of 8.5, and B18Agl-2 
(ATG GCT ATT TTC GGG GGC TGA CA) (SEQ ID NO: 14) and B18Agl-3 (CCG 
5 GTA TCT CCT CGT GGG TATT) (SEQ ID NO:15) at 2 mM magnesium at pH 9.5. 
The same experiments showed exceedingly low to nonexistent levels of expression in 
this patient's normal breast tissue (see Figure 1). RT-PCR experiments were then used 
to show that B18Agl mRNA is present in nine other breast tumor samples (from 
Brazilian and American patients) but absent in, or at exceedingly low levels in; the 

10 normal breast tissue corresponding to each cancer patient. RT-PCR analysis has also 
shown that the B18Agl transcript is not present in various normal tissues (including 
lymph node, myocardium and liver) and present at relatively low levels in PBMC and 
lung tissue. The presence of B18Agl mRNA in breast tumor samples, and its absence 
from normal breast tissue, has been confirmed by Northern blot analysis, as shown in 

15 Figure 2. 

The differential expression of B18Agl in breast tumor tissue was also 
confirmed by RNase protection assays. Figure 3 shows the level of Bl 8Agl mRNA in 
various tissue types as determined in four different RNase protection assays. Lanes 1- 
12 represent various normal breast tissue samples, lanes 13-25 represent various breast 
20 tumor samples; lanes 26-27 represent normal prostate samples; lanes 28-29 represent 
prostate tumor samples; lanes 30-32 represent colon tumor samples; lane 33 represents 
normal aorta; lane 34 represents normal small intestine; lane 35 represents normal skin, 
lane 36 represents normal lymph node; lane 37 represents normal ovary; lane 38 
represents normal liver; lane 39 represents normal skeletal muscle; lane 40 represents a 
!5 first normal stomach sample, lane 4 1 represents a second normal stomach sample; lane 
42 represents a normal lung; lane 43 represents normal kidney; and lane 44 represents 
normal pancreas. Interexperimental comparison was facilitated by including a positive 
control RNA of known B-actin message abundance in each assay and normalizing thf 
results of the different assays with respect to this positive control. 
0 RT-PCR arid Southern blot analysis has shown the B18Agl locus to be 

present in human genomic DNA as a single copy eridogenbus retroviral element A 
genomic clone of approximately 12-18 kb was isolated using the initial B18Agl 
sequence as a probe. Four additional subclones were also isolated by Xbal digestion. 
Additional retroviral sequences obtained from these clones (located as shown in Figure 
4) are shown as SEQ ID NO:3 - SEQ ID NO: 10, where SEQ ID NO:3 shows the 
location of the sequence labeled 10 in Figure 4, SEQ ID NO:4 shows the location of the 
sequence labeled 11-29, SEQ ID NO:5 shows the location of the sequence labeled 3, 
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SEQ ID NO:6 shows the location of the sequence labeled 6, SEQ ID NO:7 shqws the 
location of the sequence labeled 12, SEQ ID NO:8 shows the location of the sequence 
labeled 13, SEQ ID NO:9 shows the location of the sequence labeled 14 and SEQ ID 
NO:10 shows the location of the sequence labeled 1 1-22. 
5 Subsequent studies demonstrated that the 12-18 kb genomic clone 

contains a retroviral element of about 7.75 kb, as shown in Figures 5 A and 5B. the 
sequence of this retroviral element is shown in SEQ ID NO: 1JL The numbered line at 
the top of Figure 5 A represents the sense strand sequ^ce of the retroviral genoniic 
clone. The box below this line shows the position of selected restriction sites.; The 

10 arrows depict the different overtyping .clones used to sequence the retroviral element. 
The direction of the,, &fyyf> T ^9^^^^^ the ; singlc^pass subclone ^^quence 
corresponded to the sense or antiTsense strand. Figure 5B js a schematic diagram of the 
retroviral element containing B 1 8 Ag 1 dq?ictiiig ^ organization genes within 

the element The open boxes correspond to predicted lading frames, starting with a 

15 methionine, found throughout the element. Each of the six likely reading frames is 
shown, as in(Ucated to A? left of the ^ to those 

found on the sense strand. 

Using the cDlrfA of SEQ ID NOr l as a l^e, a longer cDNA was 
obtained (SEQ ID 1^Q:12) vjWcjh to^ 

20 compared to the genomic ^uence sho\yn in SEQ IP NO: l 1 . 

Example 2 

Preparation of Bl gAcl DNA from H uman Genomic DNA 

25 This example illustrates the preparation of B18Agl DNA by 

amplification from human genomic DNA. ^■■ r ^ - 

B18Agl DNA ^y , be ff^red ^^^j^^O^; human genoniic D*IA 

using 20 pmol of Bl8Agl specific priin^^ I unit of Tog DNA 

polymerase (Perkin Elnwav Branqhbw& NJ> , t^ 
30 parameters: ; ^ dgi^^ touchdown 

aimealingin2?e^ 

last increment (a 42°C annealing temperature) should cycl^ 25 tinies. Primers 
(B 18Agl-l, B18Agl-2, B|18Agl-3 and Bl 8Agl-4) , .^ere selected using computer 
analysis. Primers synth^ized were. Primer paire ttet may 1+3, 1+4, 2+3, 

35" and2-M. . ' j>: -\\yy ; /y/ ,^ v> 

Following gel electrophoresis, the band corresponding to B18Agl DNA 
may be excised and cloned into a suitaWe yectpn ^ 
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Example 3 

Preparation of RIKApl ONA from Breast Tumor cDNA 

5 This example illustrates the preparation of B18Ael DNA by 

amplification from human breast tumor cDNA. 

First strand cDNA is synthesized from RNA prepared from human 
breast tumor tissue in a reaction mixture containing 500 ng poly A+ RNA, 200 pmol of 
the primer (T)12AG (/.e., TTT TTT TTT TTT AG) (SEQ ID NO: 19), IX first strand. 
10 reverse transcriptase buffer, 6.7 mM DTT, 500 mmol dNTPs, and 1 unit AMV ; ^r 
MMLV reverse transcriptase (from any supplier, such as Gibco-BRL (Grand Island, 
NY)) in a final volume of 30 pi. After first strand synthesis, the cDNA is diluted 
approximately 25 fdld and I ul is used for amplification as described m Example 2. 
While some primer pairs can result in a heterogeneous population of transcripts, the 
1 5 primers BJ8A&1-2 (5'ATG GCT ATT TTC GGG GGC TGA CA) (SEQ ID NO: 14) 
and B18Agl-3 (5'CCG GTA TCT GCT CGT GGG TAT T) (SEQ ID NO:l5)l yield a 
single 151 bp amplification product. 

■■{'"■. Example 4 ■■■/■k'^l- i^' 

20 Identification of B-cell and T-cell Epitopes ofB18Agl 

This Example illustrates the identification of Bl 8Agl epitopes. 
The B18Agl sequence can be screened using a variety of computer 
algorithms. To detemiine B*ell epitopes, the sequence can be screened for 
25 hydrophobicity and hydrophilicity values using the method of Hopp, Prog. Clin. Biol. 
Res. 172B-.367-77 (1985) or, alternatively, Cease et al., 164 J. Exp. Med 1779-84 
(1986) or Spouge et al., J. Immunol. 138204-12 XI 987), 1 Additional Class^II JMHC 
(antibody or B-cell) epitopes can be predicted using programs such as AMPHI (e g.. 
Margalit et al., J. Immunol. /5&22I3 (1987)) or the methods of Rdthbard and Taylor 
30 (e.g.. EMBOJ. 7:93 (1988)). ; 

Once peptides (15-20 amino acids long) are identified using these 
techniques, individual peptides can be synAesized using automated peptide synthesis 
equipment (available from manufacturers such as Applied BioSystems, Inc~ Foster 
City, CA) and techniques such as Merrifield synthesis. Following synthesis, the 
35 peptides can used to screen sera harvested from either normal or breast cancer patients 
to determine whether patients with breast cancer possess antibodies reactive with the 
peptides. Presence of such antibodies in breast cancer patient would confirm the 
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immunogenicity of the specific B-cell epitope in question. The peptides can also be 
tested for their ability to generate a serologic or humoral immune in animals (mice, rats, 
rabbits, chimps etc.) following immunization in vivo. Generation of a peptide-specific 
antiserum following such immunization further confirms the immunogenicity of the 
5 specific B-cell epitope in question. 

To identify T-cell epitopes, the B 1 8 Agl sequence can be screened using 
different computer algorithms which are useful in identifying 8-10 amino acid motifs 
within the B18Agl sequence which are capable of binding to HLA Class I MHC 
molecules, (fee, eg.. Rammensee et al., Immunogenetics ¥7:178-228 (1995)). 
10 Following synthesis such peptides can be tested for their ability to bmd to class I MHC 
using standard binding assays (e g., Sette et d , J Immunol 75J:5586-92 (1994)) and 
more importantly can be tested for their ability to generate antigen feactive cytotoxic T- 
cells following in vitro stimulation of patient or noimal peripheral mononuclear cells 
using, for example, the methods of Bakker et a|.,= Cancer Res; ;Xfc5330-34 ; (199$); 
15 Vissereri et al. J. Immunol. J 54:3991-9* (1995); Kawakahii et al.; J: ; Immunol. 
154:3961-6* (1995); and Kast et aL, J. Immunol: 7J2;3904-12 (1994). Successftd 
invitro generation of T-celis capable of killing autologous (bearing the same class I 
MHC molecules) tumor cells following in vitro peptide stimulation further confirms the 
immunogenicity of the B18Agl antigen; Furthermore, such peptides may be used to 
20 generate murine peptide and Bl 8Agl feactivs ] vivo 
immunization in mice rendered transgenic for expression of a particular human MHC 
Class 1 haplotype (Vitiello et al., J. Exp. A/e<./ 7*1007-1 5 (1991). •! 

A representative a list of predicted B l 8 Agl B-cell ind T-cell epitopes, 
broken down according to predicted HLA Class I MHC binding antigen, is shown 
25 below: -'(••'•„. - ; ' '.•..!>•>__ . ":/r£\u\7- 

j.iV--:vr C-X . v;A ' V. . '•■ ">> «8»0 r--, .-ir - K,K-.y. • v« '•: • t 

iwi^mMArifcm^ell enitonesV r ^ 



.,} ■ qgaaqkpinlskxiewqghde (seq id no:21) 
30 spgvflehlqeayriytpfdlsa (seq id no:22) 

' Sit ^ 
gaaqkpinl (seq id no:24) • 

35 > ; NLSKXIEW (SEQ ID NO:25) i i d 

EWQGHDES (SEQ ID NO:26) 
HLQEAYRIY (SEQ ID NO:27) 



V. " 
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NLAFVAQAA^E<)IDNO:28) 
FVAQAAPDS (SEQ ID NO:29) 

From the foregoing, it wiU be appreciated that, although specific 
embodiments of the invention have been described herein for the purpose of 
illustration, various modifications may be made without deviating* fiim the spirit and 
scope of the invention. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT Corixa Corporation <? 

(ii) TITLE OF INVENTION: COMPOUNDS AND METHODS FOR THE TREATMENT 
AND DIAGNOSIS OF CANCER 

(iii) NUMBER OF SEQUENCES: 29 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: SEED and BERRY LLP 

(B) STREET: 6300 Columbia Center. 701 Fifth Avenue 

(C) CITY: Seattle 

(D) STATE: Washington 

(E) COUNTRY: USA 

(F) ZIP: 98104-7092 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0. Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 10-JAN-1997 

(C) CLASSIFICATION: 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: Maki. David J. 

(B) REGISTRATION NUMBER: 31.392 
\ (C) REFERENCE/DOCKET NUMBER: 210121. 418PC 
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(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (206) 622-4900 

(B) TELEFAX: (206) 682-6031 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 415 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single; 
(0) TOPOLOGY: linear; 



(xi ) SEQUENCE DESCRIPTION : SEQ 10 NO: 1 : 
TTGANTGTCA AAAACCTTNT AGGCTATCTC TAAAAGCTGA CTGGTATTCA TTCCAGCAAA 60 

f 

ATCCCTCTAG TTTTTGGAGT TTCCTTTTAC TATCTGGGGC TGCCTGAGCC ACAAATGCCA 120 

AATTAAGAGC ATGGCTATTT TCGGGGGCTG ACAGGTCAAA AGGGGTGTAA ATCCGATAAG 180 

CCTCCTGGAG GTGCTCTAAA AACACTCOTG GTGACTCATC ATGCCCCTGG ACGACTTCAA 240 

TCGNCTTAGA CAAGTTTATA GGTTTCTGGG CAGTCCCTGA ATACCCACGA GGAGATACCG 300 

GTGGAAATCG TCAAAAGTTC TCCCTCCACT TGAGAAATTX GGGTCCCAAT TAGGTCCCAA > ; 360 
TTGGGTCTCT AATCACTATT CCTCTAGCTT CCTCCTCCGG i#ATTGGTT;GATG%ife • 

(2) INFORMATION FOR SEQ ID NO: 2: 

(t) SEQUENCE CHARACTERISTICS: . 4 i • : • 

(A) LENGTH: 96 amino acids !Hi -V:' 
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(B) TYPE: amino acid 

(C) STRANDEDNESS: 
(0) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 2: 

Trp Asp Pro Asn Phe Ser Ser Gly Gly Arg Thr Phe Asp Asp Phe His 

i 5 io ' v, >.. 15 

Arg Tyr Leu Leu Val Gly He Gin Gly Ala Ala Gin Lys Pro He Asn 
20 25 30 

Leu Ser Lys Xaa He Glu Val Val Gin Gly His Asp Glu Ser Pro Gly 

35 40 45 > ■■- ■■• 

Val Phe Leu Glu His Leu Gin Glu Al a Tyr -Arg H e Tyr Thr Pro \ Phe t 
50 55 60 

Asp Lys Ser Ala Pro Glu Asn Ser His Ala Leu Asn Leu Ala Phe Val 

65V«;,^ ^isim -aa-^ : 75.;/ : ■ ;; ' ■ 80 ^ ""' v \ 

04 :-: Ala^Glri Ala Ala Pro Asp Ser Uys A Arg Lys Leu Gin Lys Leu Glu Gly : 

85 90 95 

woiwmwxm seosidino: 3-> ( <; - : • ■ - • v : * - ; * ^ - ; 4 : 

(i) sequeWcha^ - - j f 

(A) LENGTH: 1180 base pai rs 

(B) TYPE: nucleic acid ! • 
\ (C) STRANDEDNESS: single :f ^ ^ : ; ; 

(D) TOPOLOGY: linear 
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( xi ) SEQUENCE DESCRIPTION : SEQ 10 NO: 3 : ; r , t 

NCNNNNNTTA TGATTACGCC AAGCGNGCAA TTMCCCTCA CTAAAGGGAA CAAAAGCTGG 60 

AGCTCCACCG CGGTGGCGGC CGCTAGAATC TTCATACCCC GAACTCTTGG GAAAACTTTA 120 

ATCAGTCACC TACAGTCTAC CACCCATTTA GGAGGAGCAA AGCTACCTCA GCTCCTCCGG -180 

AGCCGTTTTA AGATCCCCCA TCTTCAAAGC CTAACAGATC AAGCAGCTCT CCGGTGCACA 240 

ACCTGCGCCC AGGTAAATGC CAAAAAAGGT CCTAAACCCA GCCCAGGGCA CCGTCTCCAA 300 

GAAAACTCAC CAGGAGAAAA GTGGGAAATT GACTTTACAG AAGTAAAACC ACACCGGGCT 360 

GGGTACAAAT ACCTTCTAGT ACTGGTAGAC ACCTTCTCTG GATGGACTGA AGCATTTGCT 420 

ACCAAAAACG AAACTGTCAA TATGGTAGTT AAGTTTTTAC TCAATGAAAT CATCCCTCGA 480 

CGTGGGCTGC CTGTTGCCAT AGGGTCTGAT AATGGAACGG CCTTCGCCTT GTCTATAGTT 540 

TAATCAGtCA GTAAGGCGH AAACATTCAA TGGAAGCTCC ATTGTGCCTA TCGACCCAGA 600 

GCTCTGGGAA GTAGAACGCA TGAACTGCAC CCTAAAAAAA CACTCTTACA AAATTAATCT 660 
TAAAAACCGG TGTTAATTGT GTTAGTCTCC TTCCCTTAGC CCTACTTAGA GTTAAGGTGC ) t, •> 720 - 

ACCCCTTACT GGGGTGGGTT CTTTACCTTT TGAAATCATN TTTNGGAAGG GGCTGCCTAT 700 

CTTTNCTTAA CTAAAAAANG CCO\TTTGGC AAAAATTTCN CAACTAATTT NTACGTNCCT 840 

ACGTCTCCCC AACAGGTANA AAAATCTNCT GCCCTTTTCA AGGAACCATC CCATCCATTC 900 
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CTNAACAAAA GGCCTGCCNT TCTTCCCCCA GTTAACTNTT 7TTTNTTAAA ATTCCCAAAA 960 

AANGAACCNC CTGCTGGAAA AACNCCCCCC TCCAANCCCC GGCCNAAGNG GAAGGTTCCC 1020 

TTGAATCCCN CCCCCNCNAA NGGCCCGGAA CCNTTAAANT NGTTCCNGGG GGTNNGGCCT 1080 

AAAAGNCCNA TTTGGTAAAC CTANAAATTT TTOTTTTNT AAAAACCACN NTTTNNTTTT 1140 

TCTTAAACAA AACCCTNTTT NTAGNANCNT ATTTCCCNCC ,1180 

;X2) INFORMATION FOR SEQ 10 N0:4: ; 

(i) SEQUENCE CHARACTERISTICS' A :• ' 

(A) LENGTH: 1163 base pairs 
••OS (B) TYPE: nucleic acid * v-. • • 

(C) STRANDEDNESS: single 
•, (D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION! SEO ID N0:4: 

(iTNCTTTGATA CCCNAGCGTT CAATTAACCG TCAGTAAAGG GAAGAAAAGC. TGGA6CTCGA 1 60 

CCGCGGTGGC GGCCGCTCTA GAGCTGCGGC /TGfiATCCCGC CAGAGTGAGG AGACCTGAAG : : 120 

; ACCAGAGAAA ACACAGGAAG TAGGCCCTTT AAACTACTCA CCTGTGTTGT CTTCTAATTT -180 

ATTCTGTTTT ATTTTGTTTC CATCATTTTA AGGGGTTAAA ATCATCTTGT TCAGACCTCA V 240 

GCATATAAAA TGACCCATCT GTAGACCTCA GGCTCCAACC ATACCCCAAG AGTTGTCTGG * 300 

TTTTGTTTAA ATTACTGCCA GGTTTCAGCT GCAGATATCC CTGGAAGGAA TATTCCAGAT 360 
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TCCCTGAGTA GTTTCCAGGT TAAAATCCTA TAGGCTTCTT CTGTTTT6AG GAAGAGTTCC 420 

TGTCAGAGAA AAACATGATT TTGGATTTTT AACTTTAATG CTTGTGAAAC GCTATAAAAA 480 

AAATTTTCTA CCCCTAGCTT TAAAGTACTG TTAGTGAGM ATTAAAATTC CTTCAGGAGG 540 

ATTAAACTGG CATTTCAGTT ACCCTAATTC CAAATGTTTT GGTGGTTAGA ATCTTCTTTA 600 

ATGTTCTTGA AGAAGTGTTT TATATTTTCC CATCNAGATA AATTCfCTCN CNCCTTNNTT 660 

TTNTNTCTNN TTTTTTAAAA CGGANTCTTG CTCCGTTGTC CANGCTGGGA ATTTTNTTTT 720 

GGCCAATCtC CGCTNCCTTG GAANAATNCT GCNTCGCAAA ATTACCNGCT TTTTCCCACC 780 

f GCACCCCNN GGAATTACCT GGAATTANAG GGCCCGNCCC CCCCGCCGGC TAATTtGTTT 840 

TTGTTTTTAG TAAAAAACGG GTTTCCTGTT TTAGTTAGGA TGGCCCANNT CTGACGCCNT 900 

NATCNTCCCC CTCNGCCCTC NAATMTTNGG NNTANGGCTT ACGCCCGCCN GNNGtTTTTC 960 

CTCCATTNAA AT7TTCTNTG GANTCTTGAA TNNCGGGTTt tCCCTTTTM ACCNATTTTT 1020 

tTTTTNNNNC CCCCANTTTT NCCTCCCCCN TNBJTAANGG GGGTTTCCCA ANCCGGGTCC 1080 

NCCCCCANGT CCCCAATTTT TGTCCCCCCC CCTCTTTTTT CTTTNCCCCA AAANTCCTAT 1140 

CTTTTCCTNN AAATATCNAN TNT ^ H63 
(2) INFORMATION FOR SEQ ID NO: 5: A 

■ (i) SEQUENCE aWWCTERISTIGS: 

(A) LENGTH: 1122 base pairs 

(B) TYPE: nuclei t acid : : 
\ AC) STRANDEDNESS: single 

(D) TOPOLOGY, linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

NNGGTCCNNC TCAAAGTCAN TATAGGGCGA ATTGGGTACC GGGCCCCCCC TCGAGGTCGA 60 

CGGTATCGAT AAGCTTGATA TCGAAnCCT GCAGCCCGG6 GGATCCACTA GTTCTAGACC 120 . 
AAGAAATGGA GGATTTJAGA GTGACTGATG ATTTCTCTAT CAJCTGCAGT. TAGTAAACAX , 180 

TCTCCACAGT TTATGCAAAA AGTAACAAAA CCACTGCAGA TGACAAACAC TAGGTAACAC 240 

ACATACTATC TCCCAAATAC CTACCWCAA GCTC^CAAT TTTAMCTGT TAG^TCA^CTj ,300 
GGCTCTMTC ACCATGACAJ GAGGTCACCA CCAAACC ATC AAGCGCTAAA C^G AC AGAAI . 360 

GTTTCCACTC CTGATCCACT GTGTGGGAAG AAGCACGGAA CTTACCCACT GGGGGGCCTG .420 
CNTCANAANA AAAGCCCATG CCCCCGGGTN TNCCTTTNAA CCGGAACGAA TNAACCCACC u. 480 
ATCCCCACAN CTCCTCTGH CNTGGGCCCT, GCATCTTGTG GCCTCN7WTN CTTTNGGGGA r 540 
NACNTGGGGA AGGTACCCCA TTTCNTTGAC CCCNCNANAA AACCCCNGTG GCCCTTTGCC; ^ 600 

CTGATTCNCN TGGGCCTTTT CTCTTTTCCC TTTTGGGTTG TTTAAATTCC CAATGTCCCC 660 
NGAACCCTCT CCNTNCTGCC CAAAACCTAC CTAAATTNCT GI^CT^GJNNJ - 
TTNCTTTTCA AAGGTNACCT TNCCTGTTCA NNCCCNACNA AMTTTNTTG CNTATfjNTGG - 780 

NCCCNNAAAA ANNNATCNNC CCNAATTGCC CGAATTGGTT NGGTTTTTCC TNCTGGGGGA 840 

AACCCTTTAA ATTTCCCCCT TGGCCGGCCC CCCTTTTTTC CCCCCTTTNG AAGGCAGGNR 900 
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GGTTCTTCCC GAACTTCCAA TTNCAACAGC CNTGCCCATT GNTGAAACCC TTTTCCTAAA 960 

ATTAAAAAAT ANCCGG7TNN GGNNGGCCTC TTTCCCCTCC N(3GNGGGNNG NGAAANTC 1020 

TACCCCNAAA AAGGTTGCTT AGCCCCCNGT CCCCACTCCC CCNGGAAAAA TNAACCTTTT 1080 

CNAAAAAAGG AATATAANTT TNCCACTCCT TNGTTCTGTT CC 1122 
(2) INFORMATION FOR SEO ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1091 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
■ (0) TOPOLOGY: linear ■ ' 

(xi ) SEQUENCE OESdRIPTION: SEQ ID NO: 6: 

NCNNNCCNTT TGTNAAAGAC CGNCAGTGAG CGCGCGTAAT ACGACTCACT ATAGGGCGAA 60 

TTGGGTACCG GGCCCCCCCT CGAGGTCGAC GGTATCGATA AGCTTGATAT CGAATTCCTG 120 

CAGCCCGGGG GATCCACTAG TTCTAGAGCT CGCGGCCGCG AGCTCTAATA CGACTCACTA 180 

TAGGGCGTCG ACT(^TCTC AGCTCACTGC AATCTCTGCC CCCGGGGTCA TGGGATTCTC 240 

CTGCCTCAGC CTTCCAAGTA GCTGGGATTA CAGGCGTGCA ACACCACACC C^GCTAATTt/ 300 

TGTATTTTTA ATAGAGATGG GGTTTTCCCT TGTTGGCCAN NATGGTCTCN AACCCCTGAC 360 

CTCNNGT6AT CCCCCCNCCC NNGANCTCNN ACTGCTGGGG ATNNCCGNNN NNNNCCTCCC 420 
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NNCNCNNNNN 


NNCNCNNTCG NTNNTCCTTN 


CTCNNNNNNN NCNNTCNNTC CNNCTTCTCN 


480 


CCMNNTNTTN 


TCNNCNNCGN NCNNNCCNCN 


TNCCCNCNNN TTCNCNTNCN NTNTCCNNCN 


540 


NNNTCNNCNN 


NCNNNNCNTN NCGNNTACNT 


CNTNNNCNNN TCCNTCTNTN NCGTCNNCNN 


600 


TCNCTNCNCN 


TTNTCTCCTC NNTNNNNNNC 


TCCNNNNNTC TCNTCNCNNC NTNCGTCNNT 


660 


NNCCNCNCCC 


CNCCTCNCNN CCTNNTTTNN 


NCNNCNNNTC CNTNCCNTTC NNNTCCNNTN :,; 


720 . 


NCNNCNTCNC 


NNNCNTTNTT CCCNCCNNTT 


CCTTNCNCNT NNNNTNTCNN NCNCNTCNNT 


780 


CNTTTNCTCC 


TNNNTCCCNN CTCNNTTCNC 


CCNNNTCGNC CCCCCNCCTN tCTCTCNCCC 


840 


NNNTNNNTNT 


NNNNCNTCCN CTNTCNCNTT 


CNTCNNTNCN TTNCTNTCNN CNNCNNTNCN 


900 


CTNCCNTNTN 


TCTNNNTCNC NTCNCNTNTC 


NCCNTCCNTT NCTNTCTCCT NTNTCCTTCC 


960 


CCTCNCCTNC 


TCNTTCNCCN CCCNNTNTNT 


NTNNCNCCNN TNCTNNNCNN CCNTCNTTTC 


1020 


NTCTCTNCTN 


NNNNTNNCCT CNNCCCNTNC 


CCTNNTNCNC TNCTNNTACC NTNGTNGTCC 


1080 


NTCTTCCTTC 


C 




1091 


(2) INFORMATION FOR SEQ ID NO: 7: 







(1) SEQUENCE CHARACTERISTICS: 
(A) LENGTH : 1165 base pairs 
• (B) TYPE: nucleic add t • - l: ; .' . . 

(C) STRANDEONESS: single 

(D) TOPOLOGY: linear ■■[>■ f--:j:X. ^^.t::/'^ u;.: V: ;X' — 
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(xi) SEQUENCE DESCRIPTION. SEO ID NO: 7: 

NCNNNTTAT6 ATTACGCCNA C6NNCAA7TA ACCTCACTAA AGGGAACAAA AGCTGGAGCT 60 

CCACC6CGGT GGCGGCCGCT CTA6A6CTCG CGGCCGCGAG CTCAATTAAC CCTCACTAAA 120 r 

GGGA6TCGAC TCGATCAGAC TGTTACTGTG TCTATGTAGA AAGAAGTAGA CATAAGAGAT 180 

TCCATTTTGT TCTGTACTAA GAAAAATTCT TCTGCCTTGA GATGCTGTTA ATCTGTAACC 240. 

CTAGCCCCAA CCCTGTGCTC ACAGAGACAT GTGCTGTGTT GACTCAAGGT TCAATGGATT 300 

TAGGGCTATG CTTTGTTAAA AAAGTGCTTG AAGATAATAT GCTTGTTAAA AGTCATCACC 360 

ATTCTCTAAT CTCAAGTACC CAGGGACACA ATACACTGCG GAAGGCCGCA GGGACCTCTG 420 

TCTAGGAAAG CCAGGTATTG TCCAAGATTT CTCCCCATGT GATAGCCTGA GATATGGCCT 480 

CATGGGAAGG GTAAGACCTG ACTGTCCCCC AGCCCGACAT CCCCCAGCCC GACATCCCCC 540 

AGCCCGACAC CCGAAAAGGG TCTGTGCTGA GGAAGATTAN TAAAAGAGGA AGGCTCTTTG 600 

CATTGAAGTA AGAAGAAGGC TCTGTCTCCT GCTCGTCCCT GGGCAATAAA ATGTCTTGGT 660 ^ 

GTTAAACCCG AATGTATGTT CTACTTACTG AGAATAGGAG AAAACATCCT TAGGGCTGGA / 720 ' 
GGTGAGACAC CCTGGCGGCA TACTGCTCTT TAATGGACGA GATGTTTGTN TMTTGCCAT v ^780 

CCAGGGCCAN CCCCTTTCCT TAACTTTTTA TGANACAAAA ACTTTGTTCN CTTTTCCTGC = , 840 
GAACCTCTCC CCGTATTANC CTATTGGCCT GCCCATCCCC TCCCC^lAANG GTGAAAANAT : WO 

GTTCNTAAAT NCGAGGGAAT CCAAAACNTT TTCCCGTTGG TCCCCTTTCC AACCCCGTCC 960 

X ...... 

CTGGGCCNNT TTCCTCCCCA ACNTGTCCCG GNTCCTTCNT TCCCNCCCCC TTCCCNGANA • 1020 ■ 
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AAAAACCCCG TNTGANGGNG CCCCCTCAAA TTATAACCTT TCCNAAACAA ANNGGTTCNA 1080 

AGGTGGTTTG NTTCCGGTGC GGCTGGCCTT GAGGTGCCGC CTNCACCCCA ATTTGGAANC 1140 

CNG I 1 1 1 1 1 1 TATTGCCCNN TCCCC 1165 ; 

(2) INFORMATION FOR SEQ 10 NO: 8: 

SE(^EWCE O^CTERISTICS: . ■ U/j 
(A) LENGTH: 1177 base pairs 

%. (B) TYPE; nucleic acid , :'* 'jJiJ 
(C) STRANOEDNESS : single 

(DX TOPOLOGY: linear 0 . ! , v 

(xi ) SEQUENCE; DESCRIPTION': SEQ ID NO: 8: >V ' 

NCCNTTTAGA TGTTGACAAN NTAAACAAGC NGCTCAGGCA GCTGAAAAAA GCCACTGATA ; v 60 

AAGCATCCTG GAGTATCAGA GTTTACTGTT, AGATCAGCCT CATTTGACTT CCCCTCCCAC ; 120 ; 

ATGGTGTTTA AATGCAGCTA CACTACTTCC TGACTCAAAC TCCACTATTC CTGTTCATGA i 180 

CTGTCAGGAA CTGTTGGAAA CTACTGAAAC; TGGCCGACCT GATCTTCAAA ATGTGCCCCT 240; 

AGGNV^^ TTCCTCGAGA AGGGACTACGj ,;A.;3Q0J;v: 
AGGGGCCGGT GCANCTGTTA CCAAGGAGAC ; TNATGTGTTG TGGGCTCAGG CTTTACCANC _ ■ 360 
AAACACCTCA NCNCNNAAGG CTGAATTGAT CGCCCTCACT. CAGGCTCTCG GATGGGGTAA . 420 

GGGATATTaX CGTTAACACT 2GACAGCAGGT ACGCCTTTGC TACTGTGCAT GTACGTGGAG 480 



WO 97/25431 



35 



PCT/US97/00398 



CCATCT ACCA GGAGCGTGGG CTACTCACTC GGCA6GTGGC TGTNATCCAC TGTAAANGGA 540 

CATCAAAAGG AAAACNNGGC TGTTGCCCGT GGT AACCANA AANCTGATCN NCAGCTCNAA ''' 600 

GATGCtGTGT TGACTTTCAC TCNCNCCTCT TAAACTTGCT GCCCACANTC TCCTTTCCCA 660 

ACCAGATCTG CCTGACAATC CCCATACTCA AAAAAAAAAN AANACTGGCC CCGAACCCNA 720 

ACCAATAAAA ACGGGGANGG TNGGTNGANC NNCCTGACCC AAAAATAATG GAtCCCCCGG 780. . 

GCTGCAGGAA TTCAATTCAN CCTTATCNAT ACCCCCAACN NGGNGGGGGG GGCCNGTNCC 840 

CATTlfcCCCT NTATTNATTC TTTNNCCCCC CCCCCGGCNT CCTTTTTNAA CTCGTGAAAG 900 

GGAAAACCTG NCTTACCAAN TTATCNCCTG GACCNTCCCC TTCCNCGGTN GNTTAMAAAX 960 

AAAAGCCCNC ANTCCCNTCC NAAATTTGCA CNGAAAGGNA AGGAATTTAA CCTTTATTTT 1020 

TTNNTCCTTT ANTTTGTNNN CCCtCTTTTA CCCAGGCGAA CNGCCATCNf" tf AANAAAAA 1080 

AAANAGAANG TTTATTTTTC CTTNGAACCA TCCCAATANA AANCACCCGC NGGGGAACGG 1140 

GGNGGNAGGC CNCTCACCCC CTTTNTGTNG GNGGGNC I- ' T 117? 

(2) INFORMATION FOR SEQ Witi:9: ■ Vv ' :i '^-^ 

(i) SEQUENCE CHARACTERISTICS: V " ' :r '^-' : ''-- '^^-.'^ - 
(A) LENGTH. 1146 base pairs , 
, (B) TYPE: hbcleic acid • • / - v ! ■'" 
(C) STRANDEDNESS: single 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: :) 

NCCNNTTNNT GATGTT6TCT TTTTGGCCTC TCTTTGGATA CTTTCCCTCT CTTCAGAGGT , 60 

GAAMGGGTC AAAAGGAGCT GTTGACAGTC ATCCCAGGTG GGCCAATGTG TCCAGAGTAC 120 

AGACTCCATC AGTGAGGTCA AAGCCTGGGG CTTTTCAGAG AAGGGAGGAT TATGGGTTTT 180 , 
CCAATTATAC AAGTCAGAAG TAGAAAGAAG GGACATAAAC CAGGAAGGGG GTGGAGCACT ; 240, ^ 

CATCACCCAG AGGGACTTGT GCCTCTCTCA GTGGTAGTAG AGGGGCTACT TCCTCCCACC 300 

ACGGTTGCAA CCAAGAGGCA ATGGGTGATG AGCCTACAGG GGACATANCC GAGGAGACAT 360 

GGGATGACCC TAAGGGAGTA GGCTGGTTTT AAGGCGGTGG GACTGGGTGA GGGAAACTCT 420 

CCTCTTCTTC AGAGAGAAGC AGTACAGGGC GAGCTGAACC GGCTGAAGGT CGAGGCGAAA 480 

ACACGGTCTG GCTCAGGAAG, ACCTTGGAAG TAAAATTATG AATGGTGCAT GAATGGAGCC 540 

ATGGAAGGGG TGCTCCTGAC CAAACTCAGC CATTGATCAA TGTTAGGGAA ACTGATCAGG 600 

GAAGCCGGGA ATTTCATTAA CAACCCGCCA CACAGCTTGA ACATTGTGAG GTTCAGTGAC 660 u , 

CCTTCAAGGG GCCACTCCAC TCCAACTTTG GCCATTCTAC TTTGCfJAAAT 1TCCAAAACT » 720 

TCCTTTTTTA AGGCCGAATC CNTANTCCCT NAAAAACNAA AAAAAATCTG CNCCTATTCJ 780 

GGAAAAGGCC CANCCCTTAC CAGGCTGGAA GAMTTTTNC CI 1 11 1 1 1 1 I.^JJTTTGAAGG 840 

CNTTTNTTAA ATTGAACCTN AATTCNCCCC CCCAAAAAAA ' AACCCNCCNG GQG^CGGAT ; I. 900 

TTCCAAAAAC NAATTCCCTT ACCAAAAAAC AAAAACCCNC CCTTNTTCCC TTCCNCCCTN 960 

TTCTTTTAAT TAGGGAGAGA TNAAGCCCCC CAATTTCCNG GNCTNGATNN GTTTCCCCCC 1020 
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CCCCCATTTT CCNAAACTTT TTGCGANCNA GGAANCCNCC CTTTTTTTNG GTCNGATTNA 1080 
NCAACCtTCC AAACCATTTT TCCNNAAAAA NTTTGNTNGG NGGGAAAAAN ACCTNNTTTT 1140 
ATAGAN 



(2) INFORMATION FOR SEQ 10 NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 545 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



1146 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

CTTCATTGGG TACGGGCCCC CTCGACCTCG ACGGTATCGA TAAGCTTGAT ATCGAATTCC 60 

TGCAGCCCGG GGGATCCACT AGTTCTAGAG TCAGGAAGAA CCACCAACCT TCCTGATTTT 120 

TATTGGCTCT GAGTTCTGAG GCCAGTTTTC TTCTTCTGTT GAGTATGCGG GATTGTCAGG 180 

CAGATCTGGC TGTGGAAAGG AGACTGTGGG CAGCAAGTTT AGAGGCGTGA CTGAAAGTCA 240 

CACTGCATCT TGAGCTGCTG AATCAGCTTT CTGGTTACCA CGGGCAACAG CCGTGTTTTC 300 

CTTTTGATGT CCTTTACAGT GGATTACAGC CACCTGaGA GGTGAGTAGC CCACGCTCCT 360 

GGTAGATGGC TCCACGTACA TGCACAGTAG CAAAGGCGTA CCTGCTGTCA GTGTTAACGT 420 

TAATATCCTT ACCCCATCGG AGAGCCTGAG TGAGGGCGAT CAATTCAGCC CTTTTGTGCT 480 
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GAGGTGTTTG CTGGTTAAGC CCTGAACCCA CAACACATCT GTCTCCATGG TAACAGCTGC 540 
ACCGG 545 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9388 base pairs 

(B) TYPE: nucleic acid u 

(C) STRANDEONESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
GCTCGCGGCC GCGAGCTCAA TTAACCCTCA CTAAAGGGAG TCGACTCGAT CAGACTGTTA 60 

CTGTGTCTAT GTAGAAAGAA GTAGACATAA GAGATTCCAT TTTGTTCTGT ACTAAGAAAA 120 

ATTCTTCTGC CTTGAGATGC TG1TAATCTG TAACCCTAGC CCCAACCCTG TGCTCACAGA 180 

GACATGTGCT GTGTTGACTC AAGGTTCAAT GGATTTAGGG CTATGCTTTG TTAAAAAAGT 240 

GCTTGAAGAT AATATGCTTG TTAAAAGTCA TCACCATTCT CTAATCTCAA GTACCCAGGG 300 

ACACAATACA CTGCGGAAGG CCGCAGGGAC CTCTGTCTAG GAAAGCCAGG TATTGTCCAA 360 

GATTTCTCCC CATGT GATAG CCTGAGATAT GGCCTCATGG GAAGGGTAAG ACCTGACTGT 420 

CCCCCAGCCC GACATCCCCC AGCCCGACAT CCCCCAGCCC GACACCCGAA AAGGGTCTGT 480"" " 

GCTGAGGAGG ATTAGTAAAA GAGGAAGGCC TCTTTGCAGT TGAGGTAAGA GGAAGGCATC " 540 

TGTCTCCTGC TCGTCCCTGG GCAATAGAAT GTCTTGGTGT AAAACCCGAT TGTATGTTCT 600 
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ACTTACT6AG ATAGGAGAAA ACATCCTTAG GGCTGGAG6T GAGACAC6CT G6CG6CAATA ? 660 

CTGCTCTTTA ATGCACGGAG ATGTTTGTAT AAGTGCACAT CAAGGCACAG CACCTTTCCT 720 

TAAACTTATT TATGACACAG AGACCTTTGT TCACGTTTTC CTGCTGAGCC h TCTCCCCACt 780 

ATTACCCTAT TGGCCTGCCA CATCCCCCTC TCCGAGATGG TAGAGATAAT GATCAATAAA 840 

TACTGAGGGA ACTCAGAGAC CAGTGTCCCT GTAGGTCCTC CGTGTGCTGA GCGCCGGTGC 900 

CTTGGGCTCA CTTTTCTUG TCTATAGTTT GTCTCTGTGT CTCTTTCTTT TCTGAGTCTC 960 ^ 

TCGTTCCACC TGACGAGAAA iTACCCAGAGG TGTGGAGGGG CAGGCCACCC GTTGAATAAT 1020 ' 

TTACTAGCCT GTTCGCTGAC , MCAAGACTG GTGGTGCAGA AGGTTGGGTC TTGGTGTTCA 108O : r 

CCGGGTGGCA GGCATGGGCC AGGTGGGAGG GTCTCCAGCG CCTGGJGGAA ATCTCCAAGA 1 140 > 

AAGTGCAGGA AACAGCACCA AGGGTGATTG TAAATTTTGA TTTGGCGCGG CAGGTAGCCA 1200 r - 

TTCCAGCGCA AAAATGCGCA GGAAAGCTTT TGGTGTGGTT GTAGGCAGGt AGGCCCGAAG 1260 

CACTTCTTAT TGGCTAATGT GGAGGGAACC TGCACATCGA TTGGCTGAAA TCTCCGTCTA 1320 V 

TTTGAGGCTG ACTGAGCGCG TTeCTTTCTT CTGTGTTGCC TGGAAACGGA CTGJCTGCdT '1380. 

AGTAACATeT GATCACGTTT CCCATTGGCC GCCGTTTCCG GAAGCCCGCC CTCCCATTTG 1AA0I 

CGGAAGGCTG GCGCAAGGTT GGTCTGCAGG TGGCCTGCAG GTGCAAAGTG GGAAGTGTGA / 1500 ?=tt 

: CTCCTf^TC* ^ GGAGGGTCAGy ;) 1560 f t : l 
CAGCGTGGAG TCCTG^^/TTG^ 

AACTGGGACT CAGGCTGCCC CCCGGCCGTT TCTCATCCGT CCACCGGACT CGTGGGCGCT 1680 ' r 
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CGCACTGGCG CTGAT6TAGT TTCCT6ACCT CTGAGCCGTA TTGTCTCCAG ATTAAAGGTA 1740 

AAAACGGGGC TT7TTGAGCC CACTCGGGTA AAACGCCTTT TGATTTCTAG GCAGGTGTTT 1800 

TGTTGCACGC CTGGGAGGGA GTGACCCGCA GGTTGAGGTT TATTAAAATA CA7TCCTGGT I860 - 

TTATGTTATG TTTATAATAA AGCACCCCAA CCTTTACAAA ATCTCACTTT TTGGCAGTTG 1920 «. • 

TATTATTTAG TGGACTGTGT CTGATAAGGA CAGCCAGTTA AAATGGAATT TTGTTGTTGC 1980 

TAATTAAACC MTTTTTAGT TTTGGTGTTT GTCCTAATAG CAACAACTTC TCAGGCTTTA 2040 - ; 

TAAAACGATA TTTCTTGGGG GAAATTTCTG TGTAAGGCAG AGCGAG7TAS TTTGGM7TG 2100 "P 

TTTTAAAGGA AGTAAGTTCC TGGTTTTGAT ATCTTA6TAG TGTMTGGGC; AAGGTGGTIT 2160 
TTACTAACCC TGTTTTTAGA CTCTCCCTTT CCTTAAATCA CCTAGCCTTG TTTCGACCTG ' 2220 

AATTGAGTCT CCCTTAGCTA AGAGGGCCAG ATGGAGTCGA TGTJGGGf GT TO 2280 >^ 

GCCCCTTCCT CAAGGACTTA ACTTGTGCAA GCTGACTCCC AGCACATGGA AGAATGCAAT 2340 ' 

TAACTGTTM GATACTGTGG CAAGCTATAT CCGCAGTTCG GAG<1AAT?GA TCCGATTGAT 2400 \ ; 

TATGCCGAAA AGCCCCGCGT CTATCACCTT GTAATAATCT •TAAAGCCCCT GCACGTG6AA 2460 
CTATTAACTT TCCTGTAAGC ATTTATCCTI 2S2Qh^ 
AATTGTTTTA ACTAGACCTC CCCTCCCGTT la/m^^Mfo^ P^mfl&r 2580;^ 
CCCTTCTTCA GAGCG&6AG MTffiGAGC ATTAGCCAf C f CTTGGCGGC' CAG~CTAAAW "2640'^ 

AATGGACTTT TMTTTCTGT <V\AAGTGTGd CGTTTTCTCT AACtOSGTCA GGTAGGAgAT 2700 ;^ 

TTGGAGGCdC CAGCGAGAAA CGTCACCGGG AGAAACGTCA CC6GGCGAGA GCCGGGCCdS 2760 
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CTGTGTGCTC CCCCGGAAGG ACA6CCAGCT TGTAGGGGGG AGTGCCACCT GAAAAAAAM 2826 ' " 

TTTCCAGGTC CCCAAAGGGT GACCGTCTTC CGGAGGACAG GGGATCGACT ACCATGCGGG 2880 

TGCCCACCAA AATTGCACCT CTGAGTCCTC MCTGCTGAC CCCGGGGTCA GGTAGGTCAG 2940 

ATTTGACTTT GGTTCTGGCA GAGGGAAGCG ACCCTGATGA GGGTGTCCCT CTTTTGACTC 3000 

TGCCCATTTC TCTAGGATGC TAGAGGGTAG AGCCCTGGTT TTCTGTTAGA CGCCTCTGTG 3060 

TCTCTGTCTG GGAGGGAAGT GGCCCTGACA GGGGCCATCC CTTGAGTCAG TCCACATCCC 312F 

AGGATGGTGG GGGACTGAGT CCTGGTTTCT GGCAGACTGG TCTCTCtCTC TCTCTTTTTC 3180 

TATCTCTAAT CTTTCCTTGT TCAGGTTTCT TGGAGAATCT CTGGGAAAGA AAAAAGAAAA 3240 

ACTG7TATAA ACTCTGTGTG AATGGTGAAT GAATGGGGGA GGACAAGGGC TTGCGCTTGT 3300 

CCTCCAGTTT GtAGCTCCAC GGCGAAAGCT ACGGAGTTCA AGTGGGCCCT CACCTGCGGT 3360 

TCCGTGGCGA CCTCATAAGG CTTAAGGCAG CATCCGGCAT AGCTCGAtCC GAGCCGGGGG 3420 

TTTATACCGG CCTGTCAATG CTAAGAGGAG CGCAAGTGGC CTaAGGGGGA GCG^CAGGG 3480 

GGGCATCTGA CTGATCCCAT CACGGGACCC CCTCCCCTTG TTTGTCTAAA AAAAAAAAAA 3540 

GAAGAAACTG TCATAACTGT TTACATGCCC TAGGGTCAAC TGTTTGTTTT ATGTTTATTG 3600 

TTCTGTTCGG TGTCTATTGT CTTGTTTAGT GGtTGTCAAG GTTTTGCATG TCAGGACGTG ; 3660 

GATAfTGGGC AAGAGGTCTG J *GGTAAGAACt TGTGCAAGGT CCTTAGItlCT GATTTTTTGl^ r ' 3720 ~ ? 

CACAGGAGGT TAAATTTCTC ATCAATCATT tAGGCTGGCC AGCACAGTGG f GTCTTTtcf ^ 3780: 

GCCAGAACCA AGTCAGGTGT TGTTACGG6A ATGAGTGT AA AAAAACATTC GCCTGATTGG 3840 
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GATTTCTGGC ACCATGATGGl TT6TATTTAG ATTCTCATAC CCCACATCCA GGTT6ATTGG 3900 

ACCTCCTCTA AACTAAACTG GTGGTGGGJT CAAAACAGGC ACCCTGCAGA TTTCCTTGCT . 3960 

CACCTGTTTG GTGATTCTGT AACTjFTTCGT GTGCCCTTAA ATAGCACACT GTGTAGGGAA 4020 

ACCTACCCTC GTACTGCTTT ACTTCGTTJA GATTCTTACT CTGTTCCTCT GTGGCTACTC 4080; 

TCCCATCTTA AAAAGGATCC AAGTGGTCCT TTTCCTCCTC CCTGCCCCCT ACCCCACAGA 4140 

TCTCGTTTTC GAGTGCGACA GCAAGJTCAG CGTCTCCAGG ACTTGGCTCT GCTCTCACTC 4200 

CTTGAACCCT TAAAAGAAAA AGCJGGGTTT GAGCTATTTG CCTTTGAGTC ATGGAGACAC : ; 4260D; 

AAAAGGTATT TAGGGJACAG AT^AGAAGA AGAGAGAGAA CACCTAGATC CAACTGAGGC 4320; 

AGGAGATCTC GGGCTGGCCT CTA6TCGTCC TCCCTGAATC TTAAAGCTAC AGTGATGTGG 4380 

CAAGTGGTAT nAGGpTTG TGGTTTWCt . GCTCTTTCTG . GTGATGTTGA ; TTCTGTTCTT. : 4440 , 

TCGATACTCC AGCGCCCeAG ; GGAGTGAQTT TCTGJGTCTG TGCTGQGTTT GATATCTATG 4500 

TTCAAATCTT ATJAAATTGC CTTCAAAAAA AAAAAAAAAA GGGAAAGACT TCCTCCCAGC 4560 

CTTGJAAGGG TTGGAGGC^T CTGGAGTO 4620 ; 

AGGATTATGG AGTCCGfiCTT AAAAAAGGCA A6CTGTGGAG ACTCTGCAAA GTAGAATGGC 4680 

CAAAGTTTGG AGTTGAGJGG GCCCTTGAAG GGTCAGTGAA CCTCACAATT GTTGAAGCTG 4740 

TGTGGO^^ 4800 

GGCTGAGTTT GGTCAGGAGC ; ACCCCTTO*; ^ CATGCAGCAT TCATAATTTT " 4860 

ACCTCCAA06 TCGTCC^GAG CCAGACGGTG TTTTCGCCTC GACCGTGAGC CGGTTGAGGT 4920 
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CGCCCTGTAC fGCCTCfCTC TGAAGAAGAG 6AGAGTCTCC CTCACCCAGT CCCACCGSCCt 4980 

TAAAACCAGC CTACTCCCTT AGGGTCATCC CATGTCTCCT CGGCTATGTC CCCTGTAGGC 5040 

TCATCACCCA TTGCCTCTTG GTTGCAACCG TGGTGGGAGG AAGTAGCCCC TCTACTACCA 5100 

CTGAGAGAGG CACAAGTCCC TCTGGGTGAT GAGTGCTCCA CCCCCTTCCT GGTTTATGTC 5160 

CCTTCTTTCT ACTTCTGACT TGTATAATTG GAAAACCCAT AATCCTCCCT TCTCTGAAAA 5220 

GCCCCAGGCT TTGACCTCAC TGATGGAGTC TGTACTCTGG ACACATTGGC CCACCTGGGA 5280 

TGACTGTCAA CAGCtCCTTT TGACCCTTTT CACGTCTGAA GAGAGGGAAA GTATCCAAAG 5340 

AGAGGCCAAA AAGTACAACC TCACATCAAC CAATAGGCCG GAGGAGGAAG CTAGAGGAAT 5400 

AGTGATTAGA GACCCAATTG GGACCTAATT GGGACCCAAA TTTCTCAAGT GGAGGGAGAA 5460 

CTTTTGACGA TTTCCACCGG TATCTCCfCG TGGGTATTCA GGGAGCTGCt CAGAAACCTA 5520 

TAAACTTGTC TAAGGCGACT GAAGTCGTCC AGGGGCATGA TGAGTCACCA GGAGTGTTTt " 5580 

TAGAGCACCT CCAGGAGGCT TATCGGATTT ACACCCCTTt TGACCTGGCA GCCCCCGAAA 5640 ; ' 

ATAGCCATGC TCTTAATTTG GCATTTGTGG CTC AGGGAGC CCCAGATAGT AAAAGGAAAC 5700 

TCCAAAAACT AGAGGGATTT TGCTGGAATG AATACCAGTC AGCTTTTAGA GATAGCCTAA 5760 

AAGGTTT7TG ACAGTCAAGA GGTTGAAAAA CAAAAACAAG CAGCTCAGGC AGCf GAAAAA 5820 }/ 
AGCWCtGAT AAAGCAl»G^^ 

TCGCCTCCCA CATGGTGTTT AMfCCAGCT AaCTACmXTT^ctCAAA CTCCiACTAtT " 5940 

CCTGTTCAfG ACTGTCAGGA ACTGTTGGAA ACTACTGAAA CTGGCCGACC TGATGtfCAA 6000 v 
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aaxpxppppp 
AAlbjbLLLL 


XAPPAAAPPT f^PATCPPAPP ^TCTTPAPAfc APAfTTAGPAG PTTPPTPGAfi 
\ AbbAAAbb 1 bbA 1 bLLALL b 1 b 1 1 LALAu ALHu 1 Hu^Hu u i i I UUMu 


ouou 


aa^ppaptap 
AAbbbAL IAL 


r a a Anrrrnr tppacptctt appat^aga PAGATGTftTT fiTftfifiPTPAG 

bAAAbbLLuu 1 bLAuL 1 b II AULA 1 uuhuH LHUH lulu! l u i uuuL l uau 




i i i appap 
bLI I.IALLAb 


paaapapptp appapaaaa^: nrTrcAATTCA tpgppptpap tpaggptptp 

LAAALALL 1 L AbLALAAAAb, bL 1 bAA 1 1 bH 1 LuLLL 1 UHL 1 LHuuL 1 L I L 


OlOU 


LGATGGGGTA 


APPATATTAA rpTTA AP A PT PAPAPPAPPX APPPPTTTCP TAPTf^THPAT 

AGGAIAI IAA Lbl 1 AALALl uALAbLAbbl ALbLLI 1 IbL lALIblbLAI 




GTACGiGGAG 


ppatptappa rr/imrTrrr ptaptpappt rACtrACfiTfcd PTZiTAATPPA 
LLA 1 L 1 ALLA GGAbLb 1 bub L IAL 1 LALL 1 LAbLAbb Ibb Lib IAA 1 LLA 


OoUU 


CTGTAAAG6A 


r* axp AAA APP AAAAP APPPP TPTTPPPPPT PPTAAPPAf!A AAf^PTfcATTP 

CATLAAAAbb AAAALALGbL 1 b llbLLLbl bb 1 AALLAbA AMbL 1 bA 1 1 L 




AGCAGCTCAA 


/>ATorAPTr>T r» apt t tpapt pappppxpta AAPTTPPTPP PPAPAPTPTP 

GATGCAGTGT GAlTTTGAGT LALGlL ILIA AAL 1 IbL 1 bL LLALAb ILIL 




CTTTCCACAG 


rrAr> ATfTPr ptp a a a Tnr- r*PPAXAPXPA APAPAAPAAP AAAAPTCPPP 

CCaGATCTGC CTGALAATCl LbLA 1 AL 1 LA ALAuAAb/VAb AAAAL 1 buLL 


0*tQU , 


XP A/"* A A P TV* A 

1 LAGAAC TCA 


PAPPPAAXAA A A AXPAPPA A PPTXPpXPCA TTPTTPPT£A PTPTAftAATP 

GAbLLAA 1 AA AAA 1 LAubAA bb 1 1 bb I bbA 1 1 L 1 1 LL 1 bA L 1 L 1 huhh l L 




ttpaxapppp 
1 1 LA lALLLL 


PAAPTPTTCP CAAAAPTTTA ATPAnTPAPP TAPAfiTPTAP PAPPPATTTA 
bAAL 1 L I 1 bb bAAAAL 1- l : 1 A ; A 1 Lhu 1 LALL ■ 1 HL.Hu I \* l hu uhva»v*h i i i h 


6600 


ppappappa a 
ubAbuAbtAA 


apptapptpa rrTrcTrmc A(ircci \ i 1 1 a agatpppppa TPTTPAAAHP 
AbL 1 ALL 1 LA. bL I LL 1 LLbb AbLLu I I I 1 H huh i u\a»uuh ; i \> i i uhhhul. 


6660 


PXAAPAPAXP 

LIAALflbAIL 


AAPPAPPTPT PPf^TfcPAPA ACCTCC(ZCCC AftCTAAATGP PAAAAAAGGT 
AAbLAbL 1 L I LLbb lAaLALA ALL luLuLLL Huu I HHH i uv. yMHHfvyjuu i 


6720 


pptaaapppa 

LL 1 AAALLLA 


CPPP AGGPPA PPGTPTPPAA GAAAAPTPAP PAftftAPlAAAA GTGfiftAAATT 

ULLLAuuLLA > LLu 1 L 1 LLAA UHHHHol.uHVy l^UWUWvVA u i guunry\ i i 


6780 


CAPTTTAPAC 
bALI 1 lALAb 


AAPTAAAAPP APAPPftfTCPT PJGGTAPAAAT APPTTPTAGT APTGGTAfiAf 
AAb 1 AAAALL ALALLubbL 1 uuu 1 HLAHH l Huy 1 l u I ho i mv^ i uu i «uhv* 


6840 


ACCTTCTCT6 


GATGGACTGA AGCA1TTGGT ACCAAAAACG AAACTGTCAA TATGGTAGTT 


6900 


AAGTTt^ 


TCAATGWT j^VlttGT^ 


6960 


AATGGAGGGG 


CCTTCGCGTT GTCJATAGTT TAGTCAGTCA GTAAGGCGTT^ 7^ 


7020 


TGGAAGCTdC 


ATTCTGCCTA T^GACCCW AGCTCTGGGC AAGTAGVV^CG ^ 


7080 
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ACCCTAAAAA ACACTCTTAC AAAATTAATC TTAGAAACCG GTGTAAATTG TGTAAGTCTC 7140 

CTTCCTTTAG CCCTACTTAG AGTAAGGTGC ACCCCTTACT GGGCTGGGTT CTTACCTTTT 7200 

GAAATCATGT ATGGGAGGGC GCTGCCTATC TTGCCTAAGC TAAGAGATGC GGAATTGGCA 7260 

AAAATATCAC AAACTAATTT ATTACAGTAC CTACAGTCTC CCGAACAGGT ACAAGATATC 7320 

ATCCTGCCAC TTGTTCGAGG AACCCATCCC AATCCAATTC CTGAACAGAC AGGGCCCTGC 7380 

CATTCATTCC CGCCAGGTGA CCTGTTGTTT GTTAAAAAGT TCCAGAGAGA AGGACTCCCT 7440 

CCTGGTTGGA AGAGACCTCA CACCGTCATC ACGATGCCAA CGGCTGTGAA GGTGGATGGC 7500 

ATTCCTGCGT GGATTCATCA CTCCCGCATC AAAAAGGCCA ACGGAGGCCA ACTAGAAACA 7560 

TGGGTGCCCA GGGCTGGGTC AGGCCCCTTA AAACTGCACC TAAGTTGGGT GAAGCCATTA 7620 

GATTAATTCT TTTTGTTAAT TTTGTAAAAC AATGCATAGC TTGTGTCAAA CTTATGTATC 7680 

TTAAGACTCA ATATAACCCC CTTGTTATAA CTGAGGAATC AATGATTTGA TTCCeeAAAA 7740 

ACACAAGTGG GGAATGTAGT GTCCAACCTG GTTTTTACTA ACCCTGTTTT TAGAGTCTCC 7800 

CTTTCGTTTA ATCACTCAGC CTTGTTTCCA CGTGAATTGA CTCTCGGTA GCTAAGAGCG 7860 

CCAGATGGAC TCCATCTTGG CTCTTTCACT GGCAGCCGCT TCGTCAAGGA CTTAAGTTGT 7926 ; ; 

GCAAGGTGAC TCCCAGCACA TCCAAGAATG CAATTAACTG ATAAGATACT GTGGbAAGGT 7980 
ATATCCGCAG TTGCCAGGAA TTCGTCCAAT; TGAHACACC tAfimiOC^iG^^m 

CCTTGTAATA ATCTTAAAGC CCCTGCACCT GGAACTATTA: ACGTTCCTGT AAGCATTTAT " 8100 

CCTTTTAApT TTTTTGCCTA CTTTATTTCT GTAAAATTGT TTTAAGTAGA CCGCCGCTCT 8160 
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CCTTTCTAAA CCAAAGTATA AAAGCAAATC TAGCCCCTTC TTCAGGCG6A GAGAATTTCG 8220 A 

AGCGTTAGCC GTCTCTTGGC CACCAGCTM ATAMCGGAT TCTTCATGTG TCTGAMGTG 8280 

TGGCGTTTTC TCTAACTCGC TCAGGTACGA CCGTGGTAGT ATTTTCCCCA ACGTCTTATT 8340 

TTTAGGGCAC GTATGTAGAG TAACTTTTAT GAAAGAAACC AG7TAAGGAG GTTTTGGGAT 8400 

TTCCTTTATC AACTGTAATA CTGGTTTTGA TTATTTATTT ATTTATTTAT 1 1 1 1 1 1 1 GAG 8460 

AAGGAGTTTC ACTCTTGTTG CCCAGGCTGG AGTGCAATGG TGCGATGTTG GGTCACTGCA 8520 

ACTTCCGCCT CCCAGGTTCA AGCGATTCTC CTGCCTCAGC CTCGAGAGTA GCTGGGATTA 8580 

TAGGCATGCG CCAGCACACC CAGCTAATTT TGTATTTTTA GTAAAGATGG GGTTTCTTCA 8640 

TGTTGGTCAA GCTGGTCTGG AACTCCCCGC CTCGGGTGAT CTGCCCGCCT CGGCCTCCGA 8700 

AAGTGCTGGG ATTACAGGTG TGATCGACCA CACCCAGCCG ATTTATATGT ATATAAATCA 8760 

CATTCCTCTA ACCAAAATGT AGTGTTTCCT TCCATCTTGA ATATAGGCTG TAGACCCCGT 8820 

GGGTATGGGA CATTGTTAAC AGTGAGACCA CAGCAGTTTT TATGTCATCT GACAGCATCT 8880 

CCAAATAGCC TTCATGGTTG TCACTGCTTC CCAAGACAAT TCCAAATAAC ACTTCCCAGT 8940 

GATGACHGC TACTTGCTAT TGTTACTTAA TGTGTTAAGG TGGCTGTTAC AGACACTATT 9000 

AGTATGTCAG GAATTACACC AAAATTTAGT GGCTCAAAGA ATCATTTTAT TATGTATGTG 9060 

GATTCTCATG GTCAGGTCAG (3ATTTG a yGAC AGGGCAtA/iS GGTAGCGCAC TTGTCTCTGT 9i20 

CTATGATGTC TGGCCTCAGC ACAGGAGACT CAAGAGGTGG GGTCTGGGAC CATTTG6AGG 9180 

CTTGTTCCfcT CACATCTGAT ACCTGGCTTG G6ATGTTGGA AGAGGGGGTG AGCTGAGACT 9240 
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GAGTGCCTAT ATGTAGTGTT TCCATATGGC CTTGACTTCC TTACAGCCTG GCAGCCTCAG 9300 
GGTAGTCAGA ATTCTTAGGA GGCACAGGGC TCCAGGGCAG ATGCTGAGGG GTCTTTTATG 9360 
AGGTAGCACA GCAAATCCAC CCAGGATC ^9388 A 

(2) INFORMATION FOR SEQ ID NO: 12: ' y a; :; 

( i ) SEQUENCE CHARACTERISTICS : °' ■"'■' yc 

(A) LENGTH: 3646 base pairs 

(B) TYPE: nucleic acid ; ' :]: '' X ;A; 
.. ■» (C) STRANDEDNESS: single 

(0) TOPOLOGY: linear " ; ' 7 '^^' v ■ ; '■ "^ 7 - 6: - : * 

(xi) SECIUENCE DESCRIPTION: SEQ ID NO 12: 
GGGAAAtACT TCCTCCCAGC CnGTAAGGG HGGAGCCCT CTCCAGTATA TGCTGCAGM 60 
TTTTTCTCTC GGTTTCTCAG AGGAtfATGG AGTCCGCCtT AAAAMgIjCA AGCTCTGGAC '" i^b 
ACTCTGCAAA GTAGAATGGC CAAAGTTTGG AGTTGAG^ j^' " 

CCTCACAATT GTTCAAGCTG TGTGGCGGGT TGTTACTGAA ACTCCCGGCC TCCCTGATCA ' 240 " 
gtttEcctac ATTGATCAAT (^TGAGTTT GGTCAGGAGC ACCCCTTCCG TGGCTCCACT 300 
CATGCACCAT TCATMTttT ACCTCCAA^G TCCT(XT GAG CCAGACCfiTG TTTTCG^CfC ' 360 T 
GACCCTCAGC CGGTTCGGCT CGCCCTGTAC TGCCTCf CTC TGAAGAAGAG (3AGAGTCTCC ''420 
CTCACC^GT (&%M^ - 
CGGCTATGTC CCCTGTAGGC TCATCACCCA TTGtCltTTG GTTGCAACCG TGGTGGGAGG " 540 
AAGTAGCCCC TCTACTACCA CTGAGV\GAC^ CACAAGTCCC TCTG&TGAT GAGTGCTCCA 600 
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CCCCCTTCCT GGTTTATGTC CCTTCTTTCT ACTTCTGACT TGTATAATTG GAAAACCCAT 660 

AATCCTCCCT TCTCTGAAAA GCCCCAGGCT TTGACCTCAC TGATGGAGTC TGTACTCTGG 720. 

ACACATTGGC CCACCTGGGA TGACTGTCAA CAGCTCC1TT TGACCCTTTT CACCJCTGAA 780 

GAGAGGGAAA GTATCCAAAG AGAGGCCAAA AAGTACAACC TCACATCAAC CAATAGGCCG 840 

GAGGAGGAAG CTAGAGGAAT AGTGATTAGA GACCCAATTG GGACCTAATT GGGACCPVAA 900 

TTTCTCAAGT GGAGGGAGAA CTTTTGACGA THCC^CCGG TATCTCCTCG TGGGTATTCA 960 

GGGAGCTGCT CAGAAACCTA TAAACTTGTC TAAGGCGACT GAAGTCGTCG AGGGGCAtGA 1020 

TGAGTCACCA GGAGTGTTTT TAGAGCACCT CCAGGAGGCT TATCAGATTT ACACCCCTJJ 1080 

TGACCTGGCA GCCCCCGAAA ATAGCCATGC TCTTAATTTG GCATTTGTGG CTCAGGCAGC 1140 , 

CCCAGATAGT AAAAGGAAAC TCCAAAAACT AGAGGGATTT TGCTGGAATG MTACCAGTC 1200 

AGCTTTTAGA GATAGCCTAA AAGGTTTTTG ACAGTCAAGA GGHGAAAAA CAAAAACAAG 1260,, 

CAGCTCAGGC A6CTGAAAAA . AGCCACTGAT AAAGCATCCT GGAGTATCAG AGTTTACTGT 1320 

TAGATCAGCC TCATTTGACT TCCCCTCCCA CATGGTGTTT AMTCCAGCT ACACTACTTC 1380 

CTGACJCAAA CTCCACTATT CCTGTTCATG ACTGTCAGGA ACTGTTGGAA ACTACTGAAA ;1440 , 

CTGGCCGACC TGATC1TCAA AATGTGCCCC TAGGAAAGGT GGATGCCACC ATGTTCACAG 1500 
ACAGTAGCAG CTTCCTCGAG AAGGGACTAC GAAAGGCCGG TGCAGCTGTI ACCATGGAjG^. ^1560% rt 
CAGATGTGTT GTGGGCTCAG GCTTTACCAG, CAAACACCTC AGCACAAAAG GCTGAATTGA - 1620 ; 

TCGCCCTCAC TCAGGCTCTC CGATGGGGTA AGGATATTAA CGTTAACACT, GACAGCAGGT 168a , ; 
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ACGCCTTTGC TACTGTGCAT GTACGTGGAG CCATCTACCA GGAGCGTGGG CTACTCACCT 1740 

CAGCAGGTGG CTGTAATCCA CTGTAAAGGA CATCAAAAGG AAAACACGGC TGTTGCCCGT 1800 

GGTAACCAGA AAGCTGATTC AGCAGCTCAA GATGCAGTGT GACTTTCAGT CACGCCTCTA I860 

AACTTGCTGC CCACAGTCTC CTTTCCACAG CCAGATCTGC CTGACAATCC CGCATACTCA 1920 

ACAGAAGAAG AAAACTGGCC TCAGAACTGA GAGGCAATAA AAATCAGGAA GGTTGGTGGA 1980 

TTCTTCCIGA CTCTAGAATC TTCATACCCC GAACTCTTGG GAAAACTTTA ATCAGTCACC 2040 ? 

TACAGTCTAC CACCGATTTA GGAGGAGCAA AGCTAGCTCA GGTCCTCCGG AGCCGTTtTA 2100 

AGATGCCCCA TCTTCAAAGG CTAAGAGATG AAGCAGCTCT CCGGTGCACA ACCTGCGCGC 2160 ' 

AGGTAAATGC CAAAAAAGGT CCTAAACCCA GCCGAGGCCA CCGTCTCCAA GAAAACTCAC 2220 

CAGGAGAAAA GTGGGAAATT GAGTTTACAG AAGTAAAACC ACACCGGGCT GGGTACAAAT 2280 

ACCWCTAGT ACTGGTAGAC AGCTTCTCTG GATGGACTGA AGCATTTGCt ACCAAAAACG 2340 n 

AAACTGTCAA TATGGTAGTT AAGT7TTTAC TCAATGAAAT CATGCCTCGA CATGGGCTGC 2400 

CTGTTTGGCA TAGGGTGTGA TAATGGACCG GCCTTCGGCT TGTCTATAGT- TTAGTGAGTG 2460 

AGTAAGGGGT TAAACATTGA ATGGAAGCTC CATTGTCCCT ATCGACCCGA GAGCTCTGGG 2520 

CAAGTAGAAC GCATGAACTG CACCCTAAAA AACACTCTTA CAAAATTAAT CTTAGAAACC 2580 
GGTCTAAATT GTGTAAGTCT CGTTCeTTTA GCCCTACTTA GAGTAAGGTG (^GCTTAC»264(j f ^ 

TGGGCTGGGT TCTTACCTTT TGAAATCATG TATGGGAGGG TGCTGCCTAT CTTGCCTAAG 2700 

CTAAGAGATG CCCAATTGGC AAAAATATCA CAAACTAATT TATTACAGTA CCTACAGTCT 2760 
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CCCCAACAGG TACAAGATAT CATCCTGCCA CTTGTTCGAG GAACGCATCC CAATCCAATT . ' 2820 

CCTGAACAGA CAGGGCCCTG CCATTCATTC CCGCCAGGTG ACCTGTTGTT TGTTAAAAAG 2880 

nCCAGAGAG AAGGACTCCC TCCTGCTTGG AAGAGACCTC ACACCGTCAT CACGATGCCA 2940 

ACGGCTCTGA AGGTGGATGG CATTCCIGQG TGGATTCATC ACTGCCGCAT CAAAAAGGCC 3000 

AACAGAGCCC AACTAGAAAG ATGGGTCCCC AGGGCTGGGT CAGGCCCCTT AAAACTGCAG 306Q 

CTAAGTTGGG TGAAGCGATT , AGATTMTTC TTTTTGTTAA TTTTGTAAAA CAATGC ATAG = 3120 

CTTCTGJCAA ACTTATGTAT CTTAAGACTC MTATMCCC/ CCTTGTTATA . ACTGAGG j.- 3180 ) 

CAATGATTTG ATTCCCCCAA; AMCACAAGT GGGGAATGTA GTGTGGAACC TGGTTTrrAG; 3240 X 

TAACCCTGTT TTTAGAGTCT CCGTTTCGTT TAATCACTCA GCTTGTTTCC ACGTGAATTG 3300 

ACTCTCGGTT AGCTAAGAGC GCCAGAIG6A CTCGATCTTG GCTC1TTGAC - TGGCAGCCGC ' 3360 

TTCCTCAAGG ACTTAACTTG TGCAAGCTGA; CTCCCAGCAG ATGCAAGAATvGCAATTAAGT 3420 

GATAAGATAC TGTGGCAAGC TATATCCGCA GTTCCCAGGA ATTCGTCCAA TTGATCACAG . 3480 V 

CCCCTCTACC CTTCAGCAAC CACCACCCTG ATCAGTCAGC AGCCATCAGC ACGGAGGCAA 3540 r 

GGCCCTCCAC CAGCAAAAAGyATTCTGACTC ACTGAAGACT TGGAT6ATCA TTAGTATTTT * 3600 v 

TAGCAGTAAA Gill III III CTTTTTGTTT C 1 1 1 1 1 1 1 CT CGTGGCf . i 3646 

(2) INFORMATION FOR SEQ ID UK 13* \ f^j^'m- SXV^x, r^r^rv,. M x 

(t) SEQUENCE CHARACTERISTICS i tf£ftS&:v> ;^pVA^ Vl-^;^" fOWV"*** 
(A) LENGTH: 10 base pairs 
\ (B) TYPE: nucleic acidA; r\- . • 
(C) STRANOEDNESS: single 
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(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: ; 
CCTCAACCTC 

V 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: 
ATGGCTATTT TCGGGGGCTG ACA 
(2) INFORMATION FOR SEQ ID N0.15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 





23 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 



CCGGTATCTC CTCGTGGGTA TT 



22 



(2) INFORMATION FOR SEQ ID NO: 16: 



]■■'■ .k.jIT' ; ';,-0'':':3(i "-nill'v 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



'W fS :HT3viJ ■ (A ! 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 



CTTCAACCTC 



10 



(2) INFORMATION FOR SEQ ID NO: 17: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: 
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CTGCCTGA6C CACAAAtG 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANOEONESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 18: 
CCGGAGGAGG AAGCTAGAGG AATA 

(2) INFORMATION FOR SEQ ID NO: 19: : ; Srk 

(i) SEQUENCE CHARACTERISTICS: ^ U -Vu- 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single ':'y.^^M? ••'.)• 

(D) TOPOLOGY: linear - -r 



(xi) SEQUENCE DESCRIPTION: SEQ ID'flOr-WP" ¥ W< w v\ • ^ t , 
(2) INFORMATION FOR SEQ ID N0:20: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:20: 

Ser Ser Gly Gly Arg Thr Phe Asp Asp Phe His Arg Tyr Leu Leu Val 
1 5 10 15 

Gly He 



(2) INFORMATION FOR SEQ ID N0:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ lbdN(i:21?' r < % *-"' 

Gin Gly Ala Ala Gin Lys Pro lie Asn Leu Ser Lys Xaa He Glu Val 
1 5 10 15 

Val Gtn Gly His Asp Glu 
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(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Pro Gly Val Phe Leu Glu His Leu Gin Glu Ala Tyr Arg lie Tyr 
5 10 15 

Thr Pro Phe Asp Leu Ser Ala 

■ 20 ; ' 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



<xi> SEQUENCE DESCRIPTION: SEQ ID N0:23: 
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Tyr Leu Leu Val Gly He Gin Gly Ala 
1 5 

(2) INFORMATION FOR SEQ 10 NO: 24: 

( i ) SEQUENCE CHARACTER ISTI CS : 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

Gly Ala Ala Gin Lys Pro He Asn Leu 
1 5 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: - ; 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single ">)-J& * 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:25: 

Asrif ,Leu Ser Lys Xaa He Glu Val Val 
1 5 
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(2) INFORMATION FOR SEQ ID N0:26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2(5: 

Glu Val Val Glh Gly His Asp Glu Ser 

1 5 ' % ■■ . • ■ ■■■ 

(2) INFORMATION FOR SEQ ID NO:27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids ? >; ^: 

(B) TYPE: amino acid v^;-. ; ^ 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION- SEQ ID N0:27: m% m . ; e, . ,< 

His Leu Gin Glu Ala Tyr Arg He Tyr 
1 5 

■4 

(2) INFORMATION FOR SEQ ID NO: 28: 
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( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:28: 

Asn Leu Ala Phe Val Ala Gin Ala Ala, ■„ 
1 5 



(2) INFORMATION FOR SEQ 10 NO: 29: 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:29: 

Phe Val Ala Gin Ala Ala Pro AS> Sea? S 
1 5 




(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 



(C) STRANOEDNESS: single 

(D) TOPOLOGY: linear 





WO 97/25431 



PCT/lis^7/00398 



59 



Claims 



1. An isolated DNA molecule, comprising: 

5** a human endogenous retroviral seqii^ice, wherein said retroviral 
sequence is preferentially expressed in a tumor tissue; t?t " r 

(h) a variant of said human endogenous retroviral sequence that contains 
one or more nucleotide substitutions, deletions, insertions and/or modifications at no more 
than 20% of the nucleotide positions, such that thb antigenic and/br ^unogenic iwperties 
of the polypeptide encoded by the human endogenous retrod sequence ate retained; or 

ir a nuc,eotide sequence encoding an epitope of a polypeptide encoded 
by at least one of the above sequences. * 



2. An isolated DNA molecule encoding an epitope bf> ^iy^pt^f 
wherein said polypeptide is encoded by: ^ ' " -^-->-Ji> 

(a) a nucleotide sequence transcribed from the sequence of SEQ ID 

(b) a variant of said nucleotide sequence that contains one or more 
nucleotide substitutions, deletions, insertions and/or modifications at no more than 20% of 
the nucleotide positions, such that the antigenic and/or imii^ogOTic properties of the 
polypeptide encoded by the nucleotide sequence are retained!. 

3. A recombinant expression vector comprising a DNA molecule 
according to claim 1 or claim 2. :>bl " H xotr^c^nc.^ m 

4 - A host cell transformed or trarisfected with an expression vector 
according to claim 3. " ; " ; ' ' 5 1 ' ' ' ■ " : : ^.w^ 

5 - A polypeptide comprising an amino acid sequence encoded by a DNA 
molecule according to claim 1 or claim 2. ^ ; ^ ^ 



6. A monoclonal antibody that binds to a polypeptide according to 



claim 5; :^ -v- -/■.■••-,••• >mv hi ^^ t k,-a%- 

7. A method for determining the presence of a cancer in # patient 
comprising detecting, within a biological sample .obtained from a patient, a, polypeptide 
according td-claim 5, and theiefiom determining [ thVp^esence of cancer in the patient 
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8. the method of claim 7 wherein the biological sample is a tumor 

sample. 

9. The method of claim 7 wherein the step of detecting comprises 
contacting the biological sample with a monoclonal antibody according to claim 6. 

10. the method of claim 7 therein the polypeptide comprises an amino 
acid sequence encoded by a human endogenous retroviral sequence selected fr<?m the group 
consisting of SEQ ID NO:l, SEQ ID NO:3 - SEQ ID NOilO and SEQ ID NO 12. 

li; A method for determining the presence of a cancer in a patient 
comprising detecting, within a biological sample obtained froth a patient, ah RNA molecule 
encoding a polypeptide according to claim 5, and therefrom determining the presence of 
cancer in the patient. ^ ' r \] m . >i:<h ; : . . . . . 

12. The method of claim 1 1 wherein the biological sample is a tumor 
sample. - :> , ; ^ ...... 4 . v . 

13 the method of claim 1 1 wherein the step of detecting comprises: 

(a) preparing cDNA from RNA molecules within the biological sample; 

and ' " " fih; "- " : - rrr: ^ - " ; ' ! 

(b) specifically amplifying cDNA molecute that are capable of encoding 
at least a portion of a polypeptide according to claim 5 . 



1 4. the method of claim 1 1 wherein the polypeptide comprises an amino 
acid sequence encoded by a human endogenous retroviral sequence selected from the group 
consisting of SEQ ID NO:l, SEQ ID NO:3 - SEQ ID NO:10 and SEQ ID NO:12. 

15. A polypeptide according to claim 5 for use within a metho4 for 
detecting the presence of a cancer in a patient 

16^ The polypeptide of claim 1 5 wherein the polypeptide comprises a?v 
amino acid sequence encoded by a human endogenous retroviral sequence selected from the 
group consisting of SEQ ID NQ:1, SEQ ID14p:3 - ^ID ^Q: 10 arid SEQ ID NO:12. 



s 17. A method for monitoring the progressipn of a cancer in a patient, 
comprising: 
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(a) detecting an amount, in a biological sample obtained from a patient, of 
a polypeptide according to claim 5; 

(b) subsequently repeating step (a); and 

(c) comparing the amounts of polypeptide detected in steps (a) and (b), 
and therefrom monitoring the progression of cancer in the patient. 

18. The method of claim 17 wherein the biological sample is a tumor 
sample. • .■•v- 

19. The method of claim 17 wherein the step of detecting comprises, 
contacting a portion of the biological sample with a monoclonal antibody according to 
claim 6. '? ';.'t ,, 

20. The method of claim 17 wherein the polypeptide comprises an amino 
acid sequence encoded by a human endogenous retroviral sequence selected from the group 
consisting ofSEQ ID NO:l,SEQID NO:3-SEQ ID NO: 1 0 and SEQ ID NO: 1 2 

21. A method for monitoring the progression of a cancer in a patient, 

comprising:-'' : ;•; .• * = ,-.«- ' . ,. s : : .. ' 

(a) detecting an amounts within a biological sample obtained from a 
patient, ofanRNA molecule encoding a polypeptide according to claim 5; 

(b) subsequently repeating step (a); and 

(c) comparing the amounts of RNA molecules detected in steps (a) and 
(b), and therefrom monitoring the progression of cancer in the patient 

22. The method of claim 21 wherein the step of detecting comprises: 

(a) preparing cDNA from RNA molecules within die biological sample; 

and 

(b) specifically amplifying cDNA molecules that are capable of encoding 
at least a portion of a polypeptide according to claim 5. 

23. The method of claim 21 wherein the polypeptide comprises an amino 
acid sequence encoded by a human endogenous retroviral sequence selected from the group 
consisting of SEQ ID NO:l, SEQ ID NO:3 - SEQ ID NO:10 and SEQ ID NO:12. 

* 

24. A pharmaceutical composition, comprising: 
(a) a polypeptide according to claim 5; and 
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(b) a physiologically acceptable carrier. 

25. A vaccine, comprising: 

(a) a polypeptide according to claim 5; and 

(b) an immune response enhancer. 

2& ^ Adiagnostickifcom 

(a) one or more monoclonal antibodies according to claim 6; and -A 

(b) a detection reagent. 

J : 27-i . /The Idt claim 26 . wherein the monoclonal antibody(s) are 
immobilized oh a solid support. 

a 28i; j A diagnostic^ kit comprising a first polymerase chain reaction primer 
and a second poly merase c^ and second primers each comprising 

at least about lG *o^^ molecule encoding polypeptide 

according to claim 5. 

29. A diagnostic kit comprising at least one oligonucleotide prober the 
oligqnucljBptide probe^ of a DNA 

molecule according to claim 1 or claim 2. ^ o > ^ / ^ 
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NUCLEOTIDE SEQUENCE OF THE REPRESENTATIVE 
BREAST-TUMOR SPECIFIC cDNA BIBAg! 



TTA GAG ACC CAA TTG GGA CCT AAT; TGG GAC CCA AAT TTC TCA AGT GGA 48 
Leu Glu Thr Gin Leu Gly Pro Asn Trip -Asp Pro Asn Phe Ser Ser Glv 

H 5 . lS 15 / 

GGG AGA ACT. TTT GAC GAT TTC CAC CGG TAT CTC CTC GTG GGT ATT CAG 96 
Gly Arg Thr Phe Asp Asp Phe His Arg Tyr Leu Leu Vol Gly He Gin 

■■ : 20: £5 - : 30 

GGA GCT GCC CAG AAA CCT ATA AAC TTG TCT AAG GCG ATT GAA GTC GT^ 144 
Gly Alo Ala Gln Lys Pro He Asn Leu Ser Lys Alo lie Glu Vol Vol 
35 40 " 45 

CAG GGG CAT GAT GAG TCA CCA GGA GTG TTT TTA GAG CAC CTC CAG GAG 192 
Gin Gly His Asp Glu Ser Pro Gly Vol Phe Leu Glu His Leu Gin Glu 

50 , 55 x 60 •: 

GCT TAT CGG ATT TAC ACC CCT? TTT GAC CTG GCA GCC CCC GAA AAT AGC 240 
Ala Tyr Arg He Tyr Thr Pro Phe Asp Leu Ala Ala Pro Glu Asn Ser 
65 • 70 75 80 

CAT GCT CTT AAT TTG GCA TTT GTG GCT CAG GCA GCC CCA GAT AGT AAA 288 
His Ala Leu Asn Leu Ala Phe Val Ala Gin Ala Ala Pro Asp Ser Lys 

: 85 90 v.L. 

AGG AAA CTC CAA AAA CTA GAG GGA TTT TGC TGG AAT GAA TAC CAG TCA 336 
Arg Lys Leu Gin Lys Leu Glu Gly Phe Cys Trp Asn Glu Tyr Gin Ser 

ioo ; 105 ; r . no 

GCT TTT AGA GAT AGC CTA AAA GGT TTT 363 
Ala Phe Arg Asp Ser Leu Lys Gly Phe 

115 12D M 



• Fig! 6 i '|f^|Sf| ' 
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